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OPTIMAL  AND  ROBUST  MEMORYLESS  DISCRIMINATION 
FROM  DEPENDENT  OBSERVATIONS 


CHAPTER  X 


INTRODUCTION 


In  this  thesis  we  will  consider  various  special  cases  of  the  binary  hypothesis  testing 
problem,  which  may  be  informally  described  as  follows:  One  observes  some  random  event 
and  wishes  to  decide  on  the  basis  of  the  observation  between  two  hypotheses  which 
concern  the  nature  of  the  random  event.  In  all  cases  that  we  will  consider,  the  random 
event  is  modeled  by  a  discrete-time  random  process  {XJ  and  the  two  hypotheses  concern 
the  probability  distribution  of  the  process.  More  specifically,  we  consider  the  following 
two  hypotheses: 


S0:  {Xj}?,1  has  a  density  fin\x) 
B\i  {Xi}?,!  has  a  density  f[n){x) 


(1.1) 


where  x  denotes  the  n-tuple  (*i,...,in).  A  decision  rule  for  this  hypothesis  testing 
problem  is  a  measurable  mapping  d  which  maps  the  observation  space  Rn  into  the  set 
{0, 1},  the  interpretation  being  that  if  x  is  observed  and  d(x)  =  t,  then  one  decides  that 
Hi  is  true.  Such  a  mapping  may  also  be  referred  to  in  this  thesis  as  a  test,  a  receiver, 
or  a  discriminator.  If  /{n*  is  absolutely  continuous  with  respect  to  f^n)  in  the  sense 
that  /jW>(x)  =  0  =>  /i^(x)  =  0,  then  the  optimal  decision  rule  for  the  Bayesian  or 
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Neyman-Pearson  criterion  [Sj  is  given  by  the  likelihood  ratio  test  (LRT): 


d(x)=l 


iff 


An)(*) 


>  v 


where  x  is  the  observed  vector,  and  the  choice  of  the  threshold  tj  depends  more  specifically 
on  the  particular  criterion  and  the  details  of  the  problem.  If  the  observed  random  process 
is  independent  and  identically  distributed  (ild)  under  either  hypothesis,  then  only  the 
marginal  densities  are  involved,  and  the  LRT  becomes 


d(x)  =  1  iff 


n  Mxj) 


>  n 


where  /o  is  the  marginal  density  under  Ho  and  f\  is  the  marginal  density  under  H\. 
Taking  logarithms  on  each  side,  we  may  also  write  this  in  the  form 


d(x)  =  1  iff 


n 


Slog 


fo(Xi) 


>  log »?. 


(1.2) 


Because  of  the  simple  form  of  the  LRT  given  by  (1.2),  this  result  has  proven  to 
be  extremely  useful  in  a  practical  sense  whenever  the  processes  can  be  assumed  to  be 
iid.  However,  if  at  least  one  of  the  processes  is  assumed  to  be  dependent,  the  LRT  might 
be  of  little  practical  value.  Consider  two  such  situations.  First,  it  may  be  the  case  that 
one  of  the  n-dimensional  densities  involved  lacks  a  closed  form  expression,  so  that  the 
LRT  also  lacks  a  closed  form  expression.  Such  is  the  generally  the  case  for  the  Rayleigh 
distribution  in  Appendix  A.  In  this  situation  the  LRT  is  not  implementable.  Second,  one 
may  wish  to  implement  a  test  ~’hich  does  not  require  an  assumption  on  the  particular 
form  of  the  n-dimensional  densities,  such  as  is  required  for  the  LRT.  For  example,  if  one 
wants  to  design  a  test  based  on  experimental  data,  then  it  is  desirable  to  base  such  a  test 
on  the  empirical  marginal  densities  and  possibly  some  of  the  lower  order  moments,  since 
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the  amount  of  data  necessary  to  establish  the  n-dimensional  empirical  densities  might  be 
rather  unmanageable.  Since  the  LET  requires  explicit  knowledge  of  the  n-dimensional 
densities,  it  is  clear  that  the  LRT  is  inappropriate  in  this  case. 

In  the  event  that  one  chooses  not  to  use  an  LRT,  he  may  proceed  by  specifying  a 
test  structure  and  then  attempt  to  determine  the  optimal  test  out  the  class  of  all  tests 
which  have  that  structure.  A  test  structure  which  has  appeared  often  in  the  literature 
is  the  following: 

d(x)  =  l  iff  Tn(x)€A  (1.3) 

where 

r»(x)  *£>(*<).  (i.4) 

i*l 

Here  if  is  z  Borel  measurable  function  which  will  often  be  referred  to  as  a  nonlinearity, 
and  A  is  a  Borel  subset  of  the  real  line  which  will  be  called  the  critical  region.  The  test 
statistic  Tn  has  been  referred  to  in  the  literature  as  a  zero- memory  nonlinearity  (ZNL). 
Note  that  in  the  case  where  the  process  is  iid  under  either  hypothesis,  the  log-LRT 
(1.2)  has  this  form  with  if(x)  =  log[/i(i)//o(*)j  and  A  =  (logq.oo).  Conversely,  if  the 
process  is  not  iid  under  at  least  one  of  the  hypotheses,  then  the  LRT  will  involve  memory, 
and  consequently  a  memoryless  decision  rule  will  be  suboptimal.  Nevertheless,  there  are 
some  advantages  to  using  a  memoryless  decision  rule,  particularly  when  simplicity  of 
implementation  is  important.  In  this  thesis  we  consider  in  detail  the  use  of  various 
memoryless  decision  rules  of  the  form  (1.3)  as  applied  to  the  discrimination  problem 
(1.1)  with  the  assmuption  that  the  process  is  stationary  under  either  hypothesis. 

In  order  to  implement  a  test  of  the  form  (1.3),  one  must  specify  the  nonlinearity 
if  and  the  critical  region  A.  Most  often,  A  will  be  an  interval,  such  as  the  interval 
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-  mm*#*.***#--*  — ' 


(7<oo)  which  involves  a  single  threshold  7.  In  this  case,  one  would  probably  proceed  by 
specifying  tj)  first  and  then  choosing  the  value  of  the  threshold  7  through  simulation  or 
actual  testing  to  adjust  the  error  probabilities  to  their  desired  values.  Before  determining 
the  nonlinearity  if>,  however,  one  must  decide  on  a  performance  criterion.  Then  rf)  will 
be  chosen  to  be  optimal  with  respect  to  this  criterion.  Although  the  most  natural 
and  useful  criterion  is  that  of  the  error  probabilities,  for  a  test  of  the  form  (1.3)  one 
cannot  in  most  cases  obtain  a  closed  form  expression  for  the  error  probabilities,  and  thus 
another  performance  measure  may  be  more  useful.  Such  is  the  case  for  the  results  of  this 
thesis,  where  we  consider  performance  measures  which  involve  the  mean  and  asymptotic 
variance  of  the  test  statistic  Tn  under  the  two  hypotheses.  For  stationary  processes,  the 
mean  of  Tn  under  Hi  is  given  by 

n 

E,T„(X)  =  Ei^^i)  =  nE^(X1)  (1.5) 

1 

and  the  variance  under  Hi  is 

Var‘rn(X)  =  E,  Tn(X)J  -  [E<  T(X)]1 

=  E<  Z  E  *(XMXk)  -  n2  (E,  V>(*i)]2 

jml  fc*l 

=  £Var,  *(*;)  + 2  £  E  CoViMXj),^**)] 

;'*1  >*1  kmj+l 

n— 1  n 

=  nVar,^(X1)  +  2^  £  Co ViMXi),*(X*-j+i)1  ■ 

jml  kmj+l 

In  our  notation,  E<,  Var,-,  and  Cov<  denote,  respectively,  the  expecation,  variance,  and 
covariance  operations  under  hypothesis  Hi.  We  now  define  two  functionals  Hi(v)  and 
of(tf)  which  will  appear  in  the  work  which  follows.  Define 


M<(V0  =  E< 
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and  in  cases  where  the  sum  exists,  define 


<,Ul>)  =  Va*  HXi)  +  2  f;  Covj  MXiUiXj+i)]  ■  (1.8) 

Thus  we  have  EjTn  =  nm,  and  we  note  also  that  if  the  sum  in  (1.8)  converges,  then 
ijVari  ^n(X)  —  j  -*•  0  as  n  -*  oo  so  that  Var<  Tn(X)  w  for  large  values  of 

n.  Thus  nof  is  the  asymptotic  variance  of  T»  under  hypothesis  Hi.  These  functionals,  or 
“moments,”  of  the  nonlinearity  i/>  have  rather  nice  expressions  in  terms  of  the  marginal 
and  joint  densities  of  the  processes,  and  by  considering  performance  measures  involving 
these  moments  as  opposed  to  the  error  probabilities,  we  shall  find  that  the  analysis 
becomes  much  more  manageable.  Note  also  that  for  such  performance  measures,  the 
n-dimensional  densities  are  not  involved  for  n  >  2. 

In  the  chapters  that  follow,  the  performance  measures  which  are  derived  are  based 
on  central  limit  theory*,  therefore  it  will  be  necessary  to  restrict  the  class  of  processes 
which  will  be  considered.  In  particular,  we  desire  that  the  processes  involved  demonstrate 
some  kind  of  asymptotic  independence  so  that  central  limit  theory  may  be  applied.  The 
type  of  asymptotic  independence  which  is  appropriate  for  the  work  here  is  that  which 
is  defined  by  various  mixing  conditions.  Let  P*  denote  the  cr-field  of  events  generated 
by  {X,-,o  <  i  <  6).  Then  the  process  {X*}  is  said  to  be  strong  mixing  if  there  exists  a 
sequence  {a„}  such  that  an  — ►  0  and 

\P(AnB)-P(A)P(B)\<an  (1.9) 

for  any  events  A  g  Ptoo,  B  g  P%+n.  If  it  is  also  true  that 

I P(A  n  B)  -  P(A)P(B)\  <  <A„P(5)  (1.10) 
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for  some  sequence  {<£n}  with  4>n  -*  0,  then  the  process  is  called  <p~mixing.  The  (^-mixing 
condition  dearly  implies  the  strong  mixing  condition.  Finally,  define  the  process  {X,}  to 
be  m-dependent  if  for  every  integer  k  we  have  that  and  are  independent. 

Note  that  m-dependence  is  a  spedal  case  of  ^-mixing  with  <j>n  =  0  for  n  >  m.  We 
indude  m-dependence  because  it  is  easier  to  work  with  analytically  and  because  in 
certain  situations  it  can  approximate  the  <£- mixing  condition  well  if  m  is  sufficiently  large. 
Because  mixing  conditions  are  defined  in  terms  of  the  underlying  <7-fields  of  events,  the 
conditions  are  preserved  by  memoryless  transformations,  so  that  {g(Xi)}  will  satisfy  the 
same  mixing  condition  as  {Xj},  provided  g  is  measurable.  Central  limit  theorems  have 
been  proved  for  strong  mixing  and  ^-mixing  processes,  and  one  such  theorem  is  given 
here  as  Theorem  1. 

Theorem  1.  Let  {XJ  be  a  stationary  <t>-mixing  process  with  SnLi  <^J/2  <  oc  and 
Jet  r(f  be  a  measurable  real-valued  function  such  that  E|VK-Xi)|  <  oo  and  Ei/>(Xi)2  <  oo. 
Then  the  series  in  (1.8)  converges  absolutely  and  [Tn(X)  -  np(if)]/ y/no2(if)  converges  in 
distribution  to  a  standard  normal  random  variable  (having  zero  mean  and  unit  variance), 
provided  <r2(if)  >  0. 

For  a  proof,  see  [2]  and  [4].  In  Chapters  2  and  3,  we  shall  assume  the  m-dependent  or 
^-mixing  condition  and  make  reference  to  Theorem  1.  In  Chapter  4  we  shall  assume 
the  strong  mixing  condition  and  shall  also  state  there  a  central  limit  theorem  for  strong 
mixing  processes. 

A  performance  measure  which  has  received  a  lot  of  attention  in  the  literature 
is  that  of  the  efficacy  of  a  test,  which  is  based  on  the  concept  of  asymptotic  relative 
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efficiency  (ARE).  In  order  to  define  the  efficacy  of  a  test,  consider  the  following  problem: 


Ho:  Xi  =  Ni 


Hu  Xi  =  Ni  + 6 


(1.11) 


where  the  process  {JV<}  is  a  stationary  process  which  represents  a  noise  process.  Thus 
under  Ho  we  observe  strictly  noise  while  under  Hi  we  observe  a  constant  signal  9  plus 
noise.  Let  denote  the  n  dimensional  density  of  the  noise  process.  Note  that  this 
situation  is  a  special  case  of  the  problem  (1.1)  with  /£n,(x)  =  /j^(x)  and  /^n,(x)  = 
f#\x  -9).  If  we  consider  a  test  involving  the  test  statistic  Til\  then  for  a  given  signal 
strength  9  and  fixed  error  probabilities  a,  /3  under  Ho  and  Hi,  respectively,  a  minimum 
sample  size  ni  is  required.  Under  similar  conditions  but  with  a  different  test  statistic 
li3)  a  sample  size  ns  is  required.  The  ARE  of  T*2)  with  respect  to  T(1)  may  now  be 
defined  as  the  limit  of  the  ratio  ni/n j  as  9  — ►  0  and  a  and  (3  remain  fixed.  The  efficacy 
of  I*1*  is  defined  to  be 


„  _  „  [£e^>]L 

n-»oo  nVaroTfri 


(1-12) 


if  the  limit  exists.  Here  the  subscripts  for  the  expectation  and  variance  operators  denote 
the  value  of  9  under  which  the  operations  are  performed;  e.g.  Varo  denotes  the  variance 
under  Hq.  The  importance  of  the  efficacy  stems  from  the  Pitman-Noether  theorem, 
which  states  that  if  certain  conditions  are  satisfied  individually  by  and  T^2\  then 
the  ARE  ofT*2)  with  respect  to  is  equal  to  the  ratio  Tft/vi  of  the  two  efficacies.  Hence 
TO)  may  be  considered  a  better  test  than  T<2)  under  the  ARE  criterion  if  t)x  >  t]2.  The 
conditions  on  which  are  necessary  for  the  Pitman-Noether  theorem  are  as  follows: 


«>  ^E,r<»U0  >  o 

(ii)  m  >  0 
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(Hi) 


:\  is _  ,.  Vars.rW  .  ,  „  „,  ,_ . 

'  -(Ie^  ~  *2*  Var07W  =  1  where  $n  =  A/v^for  some  constant 

K. 

(iv)  [ll1)-E#ll1)]/y,Var<r!i1)  converges  in  distribution  to  a  standard  normal  random 
variable  as  n  -»  oo  for  all  8  €  {0,#). 


Similar  conditions  are  required  of  T^.  Tie  condition  (iv)  requires  that  the  test  statistic 
7*1)  be  asymptotically  normal.  In  cases  where  the  noise  process  is  iid,  it  is  straight¬ 
forward  to  apply  a  central  limit  theorem  to  show  that  the  condition  (iv)  holds  for  the 
test  statistic  in  (1.4),  and  Miller  and  Thomas  [11]  have  derived  the  nonlinearity  ifj  which 
maximizes  the  efficacy  in  such  a  situation.  They  also  generalize  to  the  case  of  a  noncon¬ 
stant  signal  involving  the  test  statistic  T„(x)  =  £Li  &(*<)  for  which  the  nonlinearity 
vanes  with  time.  In  Poor  and  Thomas  [1],  the  optimal  nonlinearity  is  derived,  still  with 


respect  to  the  efficacy  performance  measure,  for  the  case  where  the  noise  process  is  in¬ 
dependent.  Halverson  and  Wise  [2]  show  how  to  correctly  extend  this  result  to  the  more 
general  case  of  ^-mixing  noise.  In  either  of  these  latter  two  cases,  Theorem  1  is  required 
in  oraer  to  demonstrate  that  the  test  statistic  is  asymptotically  normal. 

In  this  tnesis  we  shall  not  confine  ourselves  to  the  weak  signal  in  noise  problem,  but 
we  shall  consider  the  more  general  problem  of  discrimination  (1.1),  with  the  assumption 
that  the  observed  process  is  stationary  and  satisfies  a  mixing  condition  under  either 
hypothesis.  For  this  type  of  problem  the  efficacy  performance  measure  is  no  longer 
aPPropriate.  We  shall  therefore  derive  the  appropriate  performance  measures.  These 


performance  measures,  which  are  derived  under  different  problem  formulations,  are  all 
of  the  form 


(A»i  -  A* o)3 

ik*o,  <n)ir 


(1.13) 
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when  the  denominator  is  the  squan  of  some  “norm”  of  the  vector  (<To,  <ti),  and  where  /z, 
and  <J{  an  defined  by  (1.7)  and  (1.8).  Thns  these  performance  measures  are  functionals 
of  the  nonlinearities.  In  Chapter  2  we  consider  the  problem  (1.1)  under  a  Neyman- 
Pearson  formulation.  Thus  if  Pj  denotes  the  probability  of  error  when  Hi  is  true,  then 
the  formulation  considered  hen  is  to  minimize  Pi  subject  to  the  constraint  that  P0  <  a. 
For  this  situation,  it  will  be  shown  that  the  optimal  receiver  for  large  sample  sizes  (that 
is,  in  an  asymptotic  sense)  is  such  that  it  maximizes  the  performance  measure 

5, .  (i.i4, 

For  the  reverse  situation  when  Po  is  minimized  subject  to  Pi  <  a,  the  performance 
measun  is 

s.  =  <a^s£.  (us. 

In  each  of  these  two  performance  measures,  the  “norm”  in  the  denominator  is  ||(ct0,  <ti  )||  = 
<Ti,  which,  technically  speaking,  is  a  pseudo-norm,  since  ||(<ro>0i)||  =  0  does  not  necessar¬ 
ily  imply  that  (<ro,<ri)  =  (0,0).  Observe  the  similarity  of  the  performance  measure  5i  to 
the  efficacy  measure  (1.12).  This  similarity  arises  from  the  fact  that  both  performance 
measures  an  aymptotic  performance  measures  based  on  central  limit  theory.  For  the 
efficacy,  however,  the  assumptions  (i)-(iv)  an  necessary,  whereas  fewer  assumptions  are 
necessary  to  justify  the  use  of  $i.  Also  in  Chapter  2,  the  nonlinearity  which  maximizes 
the  performance  measun  c.  is  shown  to  satisfy  a  Fredholm  integral  equation  of  the  sec¬ 
ond  kind.  It  is  also  shown  how  the  integral  equation  can  be  solved  using  Hilbert- Schmidt 
theory.  In  Chapter  3,  we  consider  again  the  problem  (1.1),  this  time  under  a  minimax 
formulation;  that  is,  we  desire  to  minimize  the  maximum  of  Po  and  Pi.  It  will  be  shown 
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k*««,  r  -* 


that  the  optimum  test  statistic  is  one  which  maximizes  the  performance  measure 


<T  _  (Ml  ~ 

2"  (<t0  +  <t1)2- 


(1.16) 


The  nonlinearity  which  is  optimal  for  this  performance  measure  is  shown  to  satisfy  a 
nonlinear  integral  equation  for  which  a  closed  form  solution  cannot  be  given;  however, 
it  is  shown  how  the  solution  can  be  obtained  numerically  using  an  iterative  procedure. 
By  modifying  the  minimax  formulation  slightly,  it  is  shown  that  one  can  also  derive  the 
performance  measure 


S3  * 


(Mi  -  Mo)2 


(1.17) 


which  has  the  Euclidean  norm  in  the  denominator.  It  will  be  shown  that  the  maximiza¬ 
tion  of  S3  leads  to  a  linear  integral  equation.  In  Chapter  4,  the  issue  of  robustness  is 
addressed.  The  approach  is  that  of  game  theory,  or  minimax  theory,  where  one  tries  to 
design  the  optimal  receiver  to  match  the  worst  case  densities  chosen  out  of  uncertainty 
classes.  Results  are  given  here  for  the  performance  measures  S\  (and  consequently  So 
as  well)  and  S3.  In  Chapter  5,  the  theory  is  applied  to  the  problem  of  discrimination 
between  a  Rayleigh  density  and  a  lognormal  density,  where  strong  correlation  is  present. 
The  nonlinearity  which  maximizes  each  of  the  performance  measures  is  computed  nu¬ 
merically,  and  the  performance  results  from  computer  simulations  are  presented.  The 
simulation  results  are  compared  to  the  results  for  the  receiver  which  is  designed  under 
the  assumption  that  the  processes  are  iid.  Chapter  5  also  contains  a  discussion  of  the 
results  of  the  thesis. 
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CHAPTER  2 


THE  NEYMAN-PEARSON  FORMULATION 


2.1  The  performance  measure  Sj 

In  this  chapter,  we  consider  in  detail  the  hypothesis  testing  problem  (1.1)  under 
a  Neyman- Pearson  formulation.  Our  informal  statement  of  the  problem  is  the  following: 

minimize  Pi  (2.1) 

subject  to  Po  <  a 

where  Pi  denotes  the  probability  of  error  when  Hi  is  true.  The  reason  that  the  statement 
of  the  problem  (2.1)  is  informal  is  because  it  depends  implicitly  on  the  sample  size  n, 
and  although  we  are  interested  in  tests  with  a  fixed  sample  size,  we  do  not  wish  to 
specify  n  before  we  consider  the  problem  (2.1).  When  we  speak  of  a  test  or  decision  rule, 
we  shall  actually  mean  a  family  of  decision  rules — one  for  each  n — and  in  comparing 
different  tests,  we  shall  not  explicitly  mention  a  particular  valu:  of  n.  Since  for  any 
reasonable  test  Pi  -*•  0  as  n  — *  oo,  we  may  state  our  problem  more  accurately  in  this 
way:  considering  all  level  a  tests  (i.e.  Po  <  a  for  all  n),  find  the  test  for  which  the  rate 
of  convergence  of  Pi  to  0  is  fastest.  We  can  see  now  that  if  for  some  test  the  rate  of 
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convergence  is  faster  than  the  rate  for  another  test  tf2\  then  there  is  an  integer  N  such 
that  d*1'  is  better  than  d<2)  in  the  sense  implied  by  (2.1)  whenever  the  sample  size  n  is 
greater  than  N.  In  this  section  we  shall  derive  a  performance  measure  S\  which  specifies 
(approximately)  the  rate  at  which  P\  converges  to  0,  and  the  connection  between  this 
performance  measure  and  the  Neyman-Pearson  problem  (2.1)  should  be  clear. 

We  will  restrict  our  attention  to  only  those  decision  rules  of  the  form  ( 1.3)  with  the 
assumption  that  the  test  statistic  Tn  is  asymptotically  normal  under  either  hypothesis. 
Thus  under  hypothesis  J5T<  we  assume  that  there  exist  constants  fn  and  erf  >  0  such 
that  (T„  —  nui)/ y/naf  converges  in  distribution  to  a  standard  normal  random  variable 
as  n  -*  oo.  In  the  case  that  the  test  statistic  has  the  form  (1.4)  and  the  conditions  of 
Theorem  1  are  satisfied,  the  constants  m  and  <rf  are  given  by  (1.7)  and  (1.8),  respectively. 
With  this  assumption,  then,  for  large  values  of  n  the  distribution  of  the  test  statistic 
is  approximately  normal  with  mean  n/j and  variance  naf  when  Hi  is  true.  Taking 
a  heuristic  approach,  one  can  use  this  knowledge  to  choose  the  critical  region  A  by 
considering  the  decision  rule  (1.3)  to  be  equivalent  to  an  LRT  between  two  Gaussian 
densities.  In  our  case,  the  two  Gaussian  densities  are 


M*)  = 


¥>i(0  = 


1 

f  {t-nn o)2 

\Z2xrU7o 

l  2t urg 

1 

\  (t-nni)2 

V&:  ^ 

[  2n<r^ 

and  the  log-LBT  is  given  by 


d(x)  =1  iff 


>  nrj, 


(2.2) 


or  equivalently, 
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w 


d(x)  =  1  iff 


+  (2.3) 

when  7  =  2tj- (2/n)log(<ro/<T1).  We  can  assume  without  loss  of  generality  that  po  < 
the  case  of  Hi  =  no  i*  unlikely  to  occur  in  practice  and  will  not  be  considered,  and  the 
case  of  ft\  <  fio  follows  the  same  procedure  with  the  appropriate  sign  change.  Assume 
first  that  o\  =  <rf .  If  this  is  true,  the  expression  on  the  left  side  of  (2.3)  is  a  linear  function 
of  T„,  and  it  is  easy  to  see  that  the  log-LRT  has  the  form  (1.3)  with  A  =  [n-y1,  oo),  where 
n7'  is  the  root  of  the  linear  function  in  (2.3).  The  error  probabilities  are  then  given 
approximately  by 

P0  =  4 

L  *o 

A  -  • 

L 


(2.4) 


where 


Now  in  order  to  have  Pq  =  a,  we  must  take  7'  =  (a,o/v/n)$~l(a)  +  fi 0,  and  substituting 
for  7'  in  the  expression  for  P\  we  obtain 


Pi  =  *  [*■ ‘l(«)  -  •  (2.5) 

Now  4(a)  is  an  increasing  function  of  x  and  so  to  minimize  Pi,  it  is  necessary  to  make 
the  argument  of  4  in  (2.5)  as  small  as  possible;  that  is,  to  make  the  argument  large  in 
magnitude  and  negative.  The  term  -y/n(p  1  -  po)/<ri  is  negative,  since  we  are  assuming 
that  /io  <  m,  and  it  increases  in  magnitude  as  n  increases,  so  that  Px  -*  0.  The  quantity 
(pi  -  fM>)/<T\  determines  the  rate  at  which  P\  goes  to  zero,  and  we  can  see  that  the  best 
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asymptotic  performance  results  when  this  quantity  is  maximized.  Since  it  is  positive,  it 
is  also  clear  that  maximizing  (/it  -  is  equivalent  to  maximizing  the  quantity 

5,  =  (2.6, 

Assuming  now  that  <r£  >  <rf,  we  will  obtain  the  same  performance  measure;  the 
case  of  o\  <  <t{  will  not  be  considered  since  the  analysis  parallels  the  case  of  a\  >  erf 
and  the  same  result  is  obtained.  We  see  that  the  test  (2.3)  is  identical  to  the  test  (1.3)  if 
A  =  [j»7i,  7172],  where  »»7i  and  t»7j  are  the  roots  of  the  quadratic  in  (2.3).  The  quantities 
7i,  7j  are  given  by  the  expressions 


-  »*<r\  -  *o<ri  V(Mi  -t*o)2-  7(<rS  ~  <?i) 

71  =  - - cf-l* - 


7J  _  +  *0*1  y/in\  -  (Mi)2  -  7(*o  -~*i ) 


<TA  -  <rt 


—  A 

—  x 

vn 

*0 

and  error  probabilities  are  given  approximately  by 

*  =  *[^*]- 

A  =  *  [-v^22^i]  +  * 

Substituting  for  7i  and  72  yields 


7l  -  /Ml 


*1 


L  *3-*?  J  L  *o-*i 


2 

<Tn  -  a\ 


J 


(2.7) 


(2.8) 


(2.9) 


where  11(7)  =  y/{pi  -  /io)2  -  7(*o  “  *?)•  Thus  we  have  the  approximate  error  probabil¬ 
ities  given  as  functions  of  a  parameter  7. 

For  situations  where  the  error  probabilities  are  relatively  small,  little  is  to  be 
gained  by  preferring  a  two  threshold  test  to  a  single  threshold  test.  This  is  the  gist  of 
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Proposition  2  below.  In  order  to  make  the  proposition  precise,  it  is  necessary  to  assume 
that  Tn  is  truly  Gaussian  and  not  merely  an  approximation. 


Proposition  2.  Let  Tn  be  a  test  statistic  which  has  the  distribution  tf(npi,ncr?) 
under  hypothesis  Si  for  i  =  0, 1,  with  >  <r\.  Let  Pq1^  and  pf2*  denote  the  error 
probabilities  for  the  decision  rule  d^  of  the  form  (1.3)  with  A  =  [7*71, 7172]  where  71  and 
72  are  given  by  (2.7).  Let  Pq1^  and  pf1*  denote  the  error  probabilities  for  the  decision 
rule  d*1),  also  of  the  form  (1.3),  but  with  A  =  [7171,00).  Assume  that  the  thresholds  are 
chosen  for  each  sample  size  71  so  that  /<j2)  =  a.  Then  as  ti  -►  00,  we  have 


Pj2)  -  Pi" 


riv 


0. 


Proof.  Define 


<f\(p\  -  Po)  +  o0t>(7) 

,  Oo(p\  -  Akj)  +  <7i»(7) 

<*l  -  o\ 

—  -2  -2 
<r0  — 

<Ti(P\  -  po)  -  *ow(7) 

,  Mpi- po)-pMf) 

*o  -  <r\ 

”  -.2  —2 
<r0  —  <*1 

Then  from  (2.9)  we  have 


Pf2)  =  *(-Vn Vi)  +  *(-Vnin) 
Pi2)  a  *(-y/nvi). 


and 

li2)  a*(>/»A,)-*(i/nA2).  (2.10) 


Since  Ai  >  <To(p\  -  Po)/{^o  ~  9i)  >  the  ft”4  term  of  (2.10)  converges  to  1  as  ti  -*  00. 
But  PS2)  =  a,  so  the  second  term  converges  to  1  -  a.  Therefore  A2  -»  0.  This  implies 
that  77(7)  — *■  (<Tel<T\)(p\  -  po)  and  thus 

Pi -  Po 


0\ 


>  0. 
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The  proposition  will  follow  if  we  can  show  that 

9(-y/nui) 

•(-,/&*)  “>0‘ 

To  do  this,  use  the  inequalities  [8,p.39] 

which  are  valid  for  x  <0.  Thus  we  have 

)  6X15  [-§(*'?  ~  "2)]  —  0- 

a 

Because  the  test  statistics  which  we  consider  are  only  approximately  normal,  this 
proposition  does  not  directly  apply,  and  it  has  been  presented  to  provide  a  heuristic 
argument  for  a  single  threshold  test.  In  fact,  the  proof  of  the  proposition  depends  in  a 
crucial  way  on  the  tail  behavior  of  the  distributions,  and  for  a  series  which  converges 
under  a  central  limit  theorem,  convergence  is  usually  slowest  in  the  tail  region.  In  most 
practical  situations,  however,  the  realizations  of  the  test  statistic  under  Hi  tend  to  pile 
up  around  n m,  so  that  if  n(pi  —  jio)  is  large  then  a  two  threshold  test  offers  no  advantage 
over  a  single  threshold  test.  Thus  we  may  justify  a  single  threshold  test  not  only  on  a 
heuristic  basis  but  also  from  practical  considerations. 

We  are  now  ready  to  derive  the  performance  measure  Si  for  the  case  where  <75  > 
a\  and  where  a  single  threshold  test  of  the  form  (1.3)  is  used.  With  A  =  [n7,oo),  we 
have  the  error  probability  under  Hq  given  approximately  by 

Po  =  *  L^i^l . 

1  *0 


*(-Vnui)  /  nvl 
*(-y/nvi)  v-i  \ni/|  -  1 
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If  we  set  Po  —  a  then  we  find  the  value  of  7  to  be 

7  =  —^i~1(a)  +  u0. 

On  the  other  hand,  the  error  probability  under  H\  is  given  approximately  by 

P,.*  =  *  [-^*-‘<«>  -  (2.11) 

where  the  last  equality  in  (2.11)  is  obtained  by  making  the  substitution  for  7.  Now  the 
quantity  <r0/<ri  does  not  depend  on  the  sample  size  n.  Therefore  the  quantity  (/ji~/jo)  jax 
again  determines  the  rate  at  which  Pi  converges  to  zero.  By  the  same  reasoning  as  before, 
then,  we  see  that  the  best  asymptotic  performance  results  when  S\ ,  as  given  by  (2.6),  is 
maximized. 

The  results  obtained  in  this  section  do  not  depend  on  the  form  of  the  test  statistic 
Tn,  only  on  the  assumption  that  there  exist  the  constants  fio,  p\,  <7q,  and  ax  such  that 
(rn  -  nui)/ \fmt f  converges  in  distribution  to  a  standard  normal  random  variable  when 
Hi  is  true.  In  the  remainder  of  this  chapter,  we  restrict  our  attention  to  test  statistics  of 
the  form  T„  =  p(xj)  where  the  conditions  of  Theorem  1  hold.  Thus  the  “moments” 

/Jo,  Hi,  <To>  a\  are  given  by  (1.7)  and  (1.8)  and  the  performance  measure  S\  becomes 
a  functional  of  g. 


2.2  The  Optimal  Nonlinearity 


In  the  first  section  of  this  chapter  we  showed  heuristically  that  the  best  test 
statistic  in  the  asymptotic  sense  for  the  Neyman- Pearson  problem  (2.1)  is  that  for  which 
the  performance  measure  S\  is  maximized.  In  this  section,  we  will  consider  the  following 
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subject  to  the  constraints  that  E,  g2{X\)  <  oo  for  t  =  0, 1.  Thus  if  gi  solves  (2.12),  then 
the  test  statistic  Tn  —  **  optimal  in  the  sense  of  Section  2.1  over  the  class  of  all 

memoryless  test  statistics  (1.4).  In  the  next  section  conditions  are  given  which  guarantee 
that  the  nonlinearity  g\  derived  in  this  section  satisfies  the  constraint  Et  gx(Xx)  <  oo. 
In  order  to  have  Eo gx(Xx)  <  oo,  then,  it  is  sufficient  to  require  that  fo(x)/ fi(x)  be 
bounded  for  all  x,  and  we  shall  make  this  assumption.  We  shall  also  take  this  condition 
to  mean  that  /i(x)  s  0  =>  /o(x)  s  0  as  well.  We  assume  that  g  and  all  the  densities 
involved  are  continuous  so  that  we  can  apply  the  classical  techniques  from  the  calculus 
of  variations.  Naturally  we  will  have  to  assume  that  the  conditions  of  Theorem  1  are 
satisfied  under  each  hypothesis,  so  that  the  test  statistic  Tn  satisfies  our  assumption 
of  asymptotic  normality.  In  this  section,  however,  we  shall  require  the  more  stringent 
condition  that  the  observed  process  be  m-dependent  under  either  hypothesis.  Observe 
that  for  an  m-dependent  process  the  expression  for  o}(g)  as  given  by  (1.8)  becomes 

<7f(g)  =  Ei  g\X x)  +  £  E<  g(Xx)g(X^x)  -  (2m  +  1)  [E<  g(Xx))2  .  (2.13) 

i«t 

At  the  end  of  this  chapter,  we  shall  discuss  the  extension  of  the  m-dependent  results  to 
the  more  general  case  of  ^-mixing  processes,  and  show  that  for  all  practical  purposes, 
^-mixing  processes  can  be  approximated  by  m-dependent  ones. 

In  the  remainder  of  this  section,  we  derive  an  integral  equation  for  which  the 
optimal  nonlinearity  gx  is  a  solution.  We  begin  by  observing  that  the  value  of  Sx  is 
unchanged  when  g  is  multiplied  by  a  constant,  hence  we  can  maximize  (fiX  -  mo)2  with 
a\  held  constant.  But  under  our  assumption  that  Mt  -  f*)  >  0,  maximizing  (fix  -  no)2  is 
equivalent  to  maximizing  /tx  -  (to-  We  will  therefore  introduce  a  Lagrange  multiplier  A 
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and  consider  the  problem 


maximize  pi  -  fto  —  X a\.  (2.14) 

Now  define  J(g)  =  -hq—X <r{.  A  necessary  condition  for  g  to  maximize  /(•)  is  that  the 
Gateaux  variation  of  J  evaluated  at  g  vanish.  Thus  we  must  have  ^J(g  +  «fy)|t=0  =  0, 
where  Sg  is  an  arbitrary  continuous  function  satisfying  Ej  6g2(Xi)  <  oo  for  i  =  0,1. 
Since 

j{9  +  <6g)  =  j(9)  +  <[ei  Sg(Xx)  -  Eo  Sg(Xx)  -  2A|Ei  giXJSgm 

m 

+  ^ [Ei  g(Xl)6g(Xj+l)  +  Ex  g(Xi+l)6g(Xl)]  (2.15) 

i-i 

-  (2m  +  l)Ei  g(Xi)Et  ^(Jfi)}|  +  f2  [-A<r2(*p)] , 

which  is  a  quadratic  function  in  (,  the  Gateaux  variation  is  given  by  the  coefficient  of  t. 
Denote  by  /,•  and  //  the  densities  of  X\  and  (X\,Xj+i),  respectively,  under  H,.  If  we 
introduce  these  densities  into  the  above  expression  and  set  the  coefficient  of  e  equal  to 
0,  we  obtain 

0  =  j  6g{x)  j/i(s)  -  fo{x)  -  2a[/i(z)$(x) 

m  (2.16) 

-  (2m  +  l)/i(x)/i(y))s(y)dy]  Jdx. 

The  right-hand  side  of  (2.16)  will  be  zero  for  arbitrary  6g  iff  the  quantity  in  the  braces 
is  identically  zero.  Setting  tMs  expression  equal  to  zero,  we  easily  derive  the  following 

integral  equation: 

2A  g(x)  =  MX)f~{xfX)  +  2A  j  A'i(x,  y)g(y)dy  (2.17) 
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with  the  kernel  K\  given  by 


K\{x,y)  =  (2m  +  l)/i(y)  -  ^^53  [//(*»»)  +  Si (*»*)]•  (2.18) 

In  the  next  section  we  will  discuss  the  conditions  which  guarantee  the  existence  of  a 
solution  to  this  integral  equation.  For  now,  we  note  that  in  order  to  divide  by  /i(x)  as 
we  have  done,  we  require  the  absolute  continuity  condition  /i(x)  =  0  =>  fo(x)  -  0. 

We  have  derived  the  integral  equation  (2.17)  as  a  necessary  condition  for  g  to 
solve  the  maximization  problem  (2.14).  However,  if  g  solves  (2.17)  then  J(g  +  e6g)  = 
J(g)  —  «JA a\  so  that  J(g)  >  J(g  +  (Sg)  provided  A  >  0.  Thus  if  A  >  0,  the  condition 
that  g  solve  the  integral  equation  (2.17)  is  also  sufficient  for  g  to  solve  the  maximization 
problem.  Observing  the  form  of  (2.17)  it  is  obvious  that  A  determines  the  scaling  of  g, 
thus  we  may  take  an  arbitrary  (positive)  value  for  A.  In  the  analysis  that  follows,  it  will 
be  convenient  to  take  A  =  and  with  this  value  the  integral  equation  (2.17)  becomes 

g(x)  =  h-};JfX)  +  /  Kx(*,y)g(y)dy.  (2.19) 

If  we  make  the  substitution  g(x)  =  h(x)/y / fi(x),  the  integral  equation  (2.19)  becomes 

h(x)  =  A(*)-->(x)  +  f  K-(x,y)h(y)dy  (2.20) 

V7i(*)  J 

which  has  the  symmetric  kernel 

m 

Ki(x,y)  =  {2m+l)[fi{x)fl(y)]i  -  [/i(z)/i(y)]‘*  JI[/i(*,y)  +  /i (»»*)]•  (2.21) 

i= i 

Our  purpose  for  making  the  substitution  for  g  to  get  the  equation  (2.20)  is  to  permit  us 
to  apply  the  Hilbert-Schmidt  theory  for  symmetric  integral  equations.  This  is  the  topic 
of  the  next  section. 
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Consider  now  <r*(gi),  which  we  may  write  in  the  form 


'i(Si)  =  J  9i(x)f\(x)dx  +  JJ  9\(x)9i(v)[fi(x,y)  + fi(y,x)\dxdy 


—  (2m  +  1)  JJ  gi(x)gi(y)fi(x)/i(y)dxdy 

*  J  9l(x)fi(x)^9i(x)  +  J  {  +  /i  (».*)] 

-  (2m  +  l)/i(y)|ffi(y)dyjdx 

=  J 9i(x)fi(x)[gi(x)  -  J  K\(x,y)gi(y)dy]dx 

From  this  expression  it  can  be  seen  that  if  gi  solves  the  integral  equation  (2.1d),  then 
ffi(9i)  =  >*i(ffi)  “  /Jo(5i)i  thus  Si(ji)  =  Hi(gi)  -  Ho(gi)  is  the  optimal  value  of  S\. 
We  summarize  the  results  of  this  section  in  the  following  theorem. 


Theorem  3.  If  the  process  {X,}  is  m-dependeat  under  both  H0  and  Hi,  then  a 
sufficient  condition  for  gt  to  maximize  S\  is  that  gi  solve  the  integral  equation  (2.19). 
furthermore,  if  gi  solves  (2.19)  then  Si(gi)  =  pi(gi)  -  Po(9\)- 


2.3  The  Solution  of  the  Integral  Equation 


To  apply  the  theory  of  Fredholm  equations  we  require  the  following  two  conditions: 
[/x(*)-/o(*)l2 


W  /  AM 
(»)  JJ  \Ki(x, y)\2dxdy 


dx  <  oo 


<  00. 


Under  the  assumption  that  /o(*)//i(*)  is  bounded,  the  condition  (a)  follows  easily.  To 
show  condition  (b),  we  assume  that  the  densities  f{{x,y),j  =  l,...,m  have  the  diagonal 


expansion  [5] 


f{(*,v)  *  /l(*)/l(»)  2  an)9n(x)9n(y).  (2.22) 

nml 

where  the  functions  {#„}  are  orthonormal  in  the  sense  that  /  9m(x)9„{x)fi(x)dx  =  $mn. 
Some  examples  of  densities  which  are  known  to  have  each  an  expansion  are  the  Gaussian 
and  gamma  densities.  Consider  now  the  terms  in  the  expansion  of  |A‘*|2.  We  examine 
only  the  terms  of  the  form  //(*,y)/*(*,y)/l/i(*)/i(y)],  the  other  terms  being  more 
obviously  integrable.  If  we  introduce  the  expansion  (2.22)  and  apply  the  orthogonality 
relation,  we  have 


/i(*iy)A*(».y) 

/i(*)A(y) 


fcdy  -  2 

r»»0 


«<•»«<*> 


<  oo 


from  which  the  condition  (b)  follows.  Conditions  (a)  and  (b)  are  sufficient  to  guarantee 
that  the  solution  h\(x)  s*  y/K(z)gi(x),  if  it  exists,  is  square  integrable,  and  this  in  turn, 
implies  that  Eiy^(Xi)  <  oo. 

In  the  lid.  case,  the  kernel  K'  reduces  to  [/i(*)A(y)]fr,  and  it  is  easy  to  verify  the 
solution 

where  c  is  an  arbitrary  constant.  Note  that  the  absolute  term  of  the  integral  equation  is 
of  this  form  when  c  =  1.  We  may  therefore  define  h-M  by 

<2-23> 

and  write  the  integral  equation  as 


h(a?)  =  Aud(*)  +  J  Ki(x,y)h(y)dy. 


(2.24) 
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When  the  process  is  not  iid,  we  can  still  solve  the  integral  equation  using  the 
HUbert-Schmidt  theory,  provided  we  can  find  the  eigenvalues  and  eigenvectors  of  the 
kernel.  According  to  the  theory,  a  unique  solution  exists  provided  the  conditions  (a)  and 
(b)  above  hold,  and  provided  +1  is  not  an  eigenvalue  of  the  kernel.  If  1  is  an  eigenvalue, 
a  (non-unique)  solution  still  exists,  provided  the  absolute  term  hua  is  orthogonal  to  every 
eigenvector  corresponding  to  the  eigenvalue  1.  We  shall  see  that  1  is  an  eigenvalue  of 
the  kernel  ,  but  that  in  most  cases  a  solution  still  exists. 

If  we  assume  that  the  densities  f{,  j  =  l,...,m  have  the  expansion  (2.22)  and 
we  introduce  this  expansion  into  the  kernel,  we  have 

*?(*,*)  *  n//i(*)/i(*)[(2th  +  1)  -  2f;(f;a^)dn(*)0n(y)l.  (2.25) 

TUtO  ' 

Usually  we  will  have  9q(x)  =  1  for  such  an  expansion,  and  in  such  cases  K{  wiil  have 
eigenvalues  {A„}  and  eigenvectors  {<£„}  given  by 

Ao  =  1 

An  =  -2f  ;<#>  (n  >  1) 

i 

<t>n  =  y/TxK  (n  >  0). 

Since  Ao  =  1,  we  must  verify  that  hua  is  orthogonal  to  <fo  =  v7T»  which  is  trivial: 

J  biw(*)^o(*)d*  =  J  fi(x)  -  f0(x)dx  =  0. 

If  An  it  1,  n  >  1,  we  have  the  solution 

KX)  =  hud(*)  +  T  T^-^n(*)  +  cfc(c) 

1  ~ 

=  AM(*)  +  Y  -±&-0n(x)y/ffc)  +  C VAR 
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with 


e*  =  J  *  j  -  fo(x)]0n(x)dx  (2.26) 

and  e  an  arbitrary  constant.  Therefore,  the  nonlinearity  g  (with  c  =  -1)  is  given  by 


g(x)  =  - 


/o(») 

/i(*) 


1-An 


*»(*)• 


(2.27) 


2.4  Extension  to  ^Mixing  Processes 

In  this  section  we  consider  again  the  optimization  problem  (2.12)  under  the  more  . 
general  case  where  the  processes  are  assumed  to  be  ^-mixing.  We  will  use  a  compactness 
argument  to  prove  that  the  optimization  problem  has  a  solution  &  and  then  show  that 
if  solves  the  integral  equation  (2.19)  then  the  sequence  5i(^m))  converges  to  the 
optimal  value  Si(gi)  as  m  -*  oo.  Obviously,  similar  results  hold  for  the  performance 
measure  50. 

First,  let  us  define  some  new  symbols.  Let  X2(/i)  denote  the  Hilbert  space  con¬ 
sisting  of  the  Borel  functions  g  such  that  f  g2(x)fi(x)dx  <  oo  with  the  inner  product 
( g,h )  =  /  g(x)h(x)fi(x)dx  and  norm  |js||  =  [/  ff2(x)/i(z)di]  Let  Q  be  the  subset  of 
L3(f\)  which  contains  the  elements  r  such  that  f  g(x)fi(x)dx  =  0  and  J g2(x)fi(x)dx  = 

1,  and  note  that  Q  is  compact.  Define 

<rlm(g)  =  E ig\Xx)  +  2^  E,  g(Xx)g(Xi+i)  -  (2m  +  l)[E,  tf(X1)]2  (2.28) 

i 

and 

S\m\g)  =  (2.29) 

for  i  as  0,1.  If  the  process  is  m-dependent  under  Hi  then  of  m  =  a}.  Finally,  let 
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denote  the  solution  to  the  integral  equation  (2.19),  which  by  Theorem  3  maximizes 
over  the  space  XJ(/i). 

Lemma  4.  The  functional  S{TOf  is  continuous. 

Proof.  That  the  numerator  of  (2.29)  is  continuous  follows  from  the  Schwarz  inequality 
and  the  assumption  (a)  of  Section  2.3: 

|[pi(5)-M5)]  -  [miW-w>W]|  =  1(5  -  ^ (/i  -/o)//i}| 

<ll5-/»MI(/i-/o)//il|. 

Thus  we  must  show  that  ofm(-)  is  continuous.  The  first  term  on  the  righthand  side  of 
(2.28)  is  the  composition  of  the  maps  g  *-»  ||p||  and  x  x2,  so  it  is  continuous.  The 
map  g  *-*■  f  g(x)fi(x)dx  =  {g,  1)  is  continuous  by  the  Riesz  representation  theorem. 
Thus  the  last  term  of  (2.28)  is  continuous.  Now  suppose  (ft,.F,.Pi)  is  the  underlying 
probability  space  when  B\  is  true,  and  let  L2{Pi)  denote  the  Hilbert  space  consisting  of 
all  random  variables  X  such  that  EjJf2  <  oo  with  the  inner  product  (X,Y)  -  Ei  XY. 
The  random  variables  g(Xj)  for  j  =  1,2,...  are  in  L2(  P-)  if  g  is  in  I2(/i).  Also,  if 
||  •  |||  denotes  the  norm  for  L2(fi)  and  j|  •  Hz  denotes  the  norm  in  X2(Pj),  then  ||p  - 
All!  =  ||ff(^j)  —  h(Xj)||j  by  stationarity.  By  the  continuity  of  the  inner  product,  then, 
\Eig(Xi)g(Xj)  -  Eihn(Xi)hn(Xj)l  — *  0  as  ||ff-  A„||i  — ►  0.  Thus  the  remaining  terms  of 
(2.28)  are  continuous.  Q 

Define  the  class  C  to  consist  of  all  bivariate  joint  dens:*;es  which  have  the  diagonal 


expansion  (2.22)  with  { 0„ }  being  polynomials. 

Lemma  5.  If  the  densities  f{,j  =  1,2,...  are  in  C  then  the  functional  Si  is  continu- 


Proof.  The  idea  of  the  proof  is  to  show  that  S{m)  converges  to  S,  uniformly.  First  we 
show  that  <rf,m(g)  converges  uniformly  to  <r\{g)  for  g  in  Q.  Let  {*„}  be  the  orthononnal 
functions  in  the  expansion  (2.22).  Since  g  6  LJ(/,),  g  has  an  expansion 


Then,  introducing  the  expansion  (2.22)  and  applying  the  orthogonality  relation,  we  have 
fj  5(*)5(y)/tJ)(x,  y)dzdy 

=  //  H  H  E6'6"*a»)^(*)«n(*)^m(y)«n(y)/l(*)/l(y)dxdy 

J  J  l_1 - *  _ _ A 


tel  mil  tisO 


=  E‘ia“- 


Since  f  g2{x)fi(x)dx  —  ®  1  for  g  6  (7,  the  maximum  value  of 

|//$(*)$(y)/ij)(*,y)d*dy|  is  equal  to  the  maximum  value  of  the  sequence  { |o(«j)  | ,  n  >  1 } . 
It  will  be  shown  in  Chapter  4  that  this  maximum  occurs  at  either  |a^|  or  |a^|. 
Since  8i  €  Q  and  *?(*)  =  1  +  2^ a\j)  for  i  =  1,2,  we  have  1  +  £jCj  <  oo  where 
Cj  =  max(|a1^|,|<4^|),  and  this  series  dominates  the  series  o\{g)  independently  of  g. 
Thus  <r\  m(g)  converges  uniformly. 

Now  if  g  is  any  nonconstant  element  of  L2(fi),  then  it  follows  that  there  exist 
constants  a,  b  such  that  ag+b  6  Q.  Since  <rJiin(op+6)  converges  uniformly,  and  since  5jm> 
and  Si  are  invariant  under  such  transformations  of  g,  it  follows  that  S{m)(p)  converges 
uniformly  to  S\(g).  0 


We  are  now  ready  to  prove  the  main  result. 
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Theorem  6.  If  the  densities  f(,j  =  1,2,...  are  in  C  then  there  exists  a  solution 
gi  €  Q  to  the  optimization  problem  (2.12).  If  g^  solves  the  integral  equation  (2.19), 
then  the  sequence  {5i(jjm*)}  converges  to  S\(gi)  as  m  —■  oo. 

Proof.  Since  Si  in  continuous  and  the  set  Q  is  compact,  there  exists  an  element  g'  of 
Q  such  that  S\  achieves  its  maximum  value  on  the  set  Q  at  g'.  If  g  is  any  nonconstant 
dement  of  L2(fi),  then  there  exist  constants  a  and  6  such  that  ag  +  b  6  Q.  But  S\(g)  = 
St(ag  +  6)  <  Sx(g').  Thus  g'  solves  (2.12). 

Let  e  >  0.  In  the  proof  of  Lemma  5,  it  was  shown  that  converges  uniformly 
to  S\.  Thus  there  exists  an  integer  M  such  that  tot  every  m  >  M  and  every  g  6 
L2(fi)  we  have  |${m)($)  -  $i(y)|  <  €.  Let  m  >  M  be  fixed.  If  S{m)(^m))  <  5, (</,), 
then  we  must  have  <  Si(fli).  Otherwise,  we  have  5i(^rn))  < 

Si(gi)  <  In  either  case,  |S{m,(0jm))  -  Si(ji)|  <  «.  This  implies  that 

\Sx(g[m))-Sl(gl)[<2'.  0 

According  to  the  theorem,  ode  can  achieve  a  value  of  the  performance  measure 
S\  which  is  arbitrarily  dose  to  the  optimal  value  by  solving  the  integral  equation  (2.19) 
with  m  suffidently  large. 
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CHAPTER  3 


THE  MINIMAX  FORMULATION 


3.1  The  Performance  Measure  5j 

Before  one  can  determine  the  optimal  test  statistic  (or  nonlinearity)  in  some  class 
of  allowable  test  statistics,  one  most  define  an  ordering  on  the  class.  Although  the 
most  natural  ordering  to  consider  is  determined  by  the  receiver  operating  characteristic 
(ROC),  the  ROC  itself  does  not  provide  a  total  ordering,  so  that  given  two  different  test 
statistics  one  cannot  always  say  which  is  the  better  by  comparing  their  ROC’s.  Rather, 
in  order  to  have  a  total  ordering,  one  must  specify  a  particular  region,  or  operating  point, 
of  the  ROC-  Under  the  Neyman- Pearson  formulation,  this  operating  point  is  specified  to 
be  such  that  Po  —  a.  Such  an  operating  point  may  be  undesirable  when  the  asymptotic 
performance  is  important  since  only  Pi  converges  to  0  while  Po  remains  fixed.  For  this 
reason,  it  may  be  better  to  consider  the  minimax  operating  point  where  Pq  =  P\.  The 
term  “minima*”  refers  to  the  fact  that  the  decision  rule,  including  both  the  threshold 
(or  the  critical  region)  and  the  nonlinearity  g  are  chosen  to  solve  the  problem 

minimize  max  Pi. 

<- 0,1 


(3.1) 


Actually,  we  shall  consider  the  problem  (3.1)  in  an  asymptotic  sense,  similar  to  oar 
method  for  the  Neyman-Pearson  formulation.  Thus  we  attempt  to  maximize  the  rate 
at  which  max(Po»  -Pi)  converges  to  zero.  Note  that  if  the  ROC  is  continuous,  then  we 
can  always  choose  the  critical  region  so  that  P0  =  Pi,  and  thus  we  actually  maximize 
the  rate  at  which  the  common  value  of  P0  and  Pi  converges  to  0.  This  rate  is  given 
approximately  by  the  performance  measure  5j,  as  derived  in  this  section. 

We  proceed  by  assuming  as  in  Chapter  2  that  the  critical  region  A  is  given  by 
[1*71,  »7a],  where  71  and  7j  are  given  by  (2.7),  so  that  the  error  probabilities  are  given 
approximately  by  (2.9).  Now  if  we  choose  the  parameter  7  =  0  so  that  v(j)  =  mi  -  Mo, 
then  the  expressions  for  the  error  probabilities  reduce  to 


P0  =  *  ft 

L  Lv  *o  +  0-1 J 

j\  -  * 


Now  typically  the  term  -  hq)/((Tq  -  <rt)J  is  orders  of  magnitude  smaller  than 

the  term  ft[— y/n(jii  —  Hq)/(<to  +  01)],  and  in  fact  the  ratio  of  these  two  terms  goes  to 


lim  — - =  0. 


uui  -  - =-  — 

n-“°°ft  -v^a. ZJH 

.  (To  +  <7j  . 

Thus  we  may  approximate  the  first  term  of  the  expression  for  Pq  by  1  and  approximate 
the  first  term  of  the  expression  for  P\  by  0.  Then  we  have  the  error  probabilities  given 


approximately  by 


Po  =  Pi  =  ft  * - 

L  +  <Tl . 


It  should  be  clear  from  (3.3)  that  the  quantity  (/ij  -  ho)/{<to  +  <ti)  determines  (approx¬ 
imately)  the  rate  at  which  Po  and  P\  converge  to  0.  However,  if  we  assume  as  we  did  in 
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Chapter  2  that  ni  >  no,  then  this  is  equivalent  to  maximizing  the  quantity 


_  (Ml  -  Mo)2 
“  (<r0  +  <ri)2' 


(3.4) 


This  performance  measure  Sj  determines  approximately  the  rate  at  which  Pq  and  Pi 
converge  to  0.  It  has  been  derived  also  in  [3]  using  ChemofF  bounds  as  a  nonlocal 
approach  to  the  signal  detection  problem  (1.11).  The  approach  here  is  different  in  that 
the  focus  is  on  signal  discrimination.  Note  that  the  performance  measure  5?  treats 
equally  the  asymptotic  variance  under  the  two  hypotheses,  whereas  this  is  not  the  case 
for  the  performance  measure  S\ .  However,  in  the  remainder  of  this  chapter  we  shall  see 
that  5]  is  much  more  difficult  to  work  with  analytically  due  to  the  nonlinear  function  of 
and  o2  in  the  denominator. 


3.2  The  Optimal  Nonlinearity 

As  in  the  case  of  the  performance  measure  Si  of  the  last  chapter,  the  nonlinearity 
(ft  which  maximizes  the  performance  measure  S?  is  also  given  by  the  solution  of  an 
integral  equation;  however,  the  integral  equation  for  this  case  is  nonlinear,  and  we  shall 
not  be  able  to  obtain  a  closed  form  solution.  The  technique  used  to  derive  the  integral 
equation  is  similar  to  that  used  earlier,  and  we  will  again  assume  that  the  processes  are 
m-dependent.  The  value  of  Si(g)  remains  unchanged  if  g  is  multiplied  by  a  constant, 
and  we  may  therefore  attempt  to  maximize  (ni  -  Mo)2  with  (<to  +  <J\)2  constrained  to 
be  constant.  Equivalently,  we  can  consider  the  following  optimization  problem  which 
involves  a  Lagrange  multiplier  A: 

maximize  mi-  Mo  -  A(«ro  +  <ti)j  (3.5) 
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To  solve  this  problem,  define  J(g )  =  H\(g)  -  no(g)  -  A[<ro(p)  +  0i(p)]J,  and  let  6g  be 
an  arbitrary  continnons  function  satisfying  Eiig2(Jfj)  <  oo  for  t  =  0,1.  A  necessary 
condition  for  g  to  solve  (3.5)  is  that  £j(g  +  f^)|,=0  =  0.  To  show  the  steps  involved 
in  taking  the  derivative,  write  J(g)  in  the  following  form: 

J{g)  =  Mi (g)  -  Mote)  -  -  Aof(p)  -  2A \J (rl{g)<r\{g)  (3.6) 

Also  define  Ai(g,6g)  by  2 Ai(g,Sg)  =  £<r,?(p  +  ete)|(m0-  Then  ^»te»te)  is  given  by 

m 

Mg,6g)  =  E  ig(Xl)6g(Xl)  +  £[E  WXiMXj)  +  Eig(Xj)6g(Xl)] 

i«  i 

-(2m  +  l)Etf  (*i)E,*S(.Yx)  (3.7) 


Now  we  have  for  the  contribution  of  the  last  term  on  the  right  of  (3.6) 


+  *6g)o\{g  +  <te)  =  fcoteWte)]'*  [*oteMiteite)  +  Ao(g,  teMte)] 

<»0 


W)M3'S!I)+^)M,'S9) 


(3.8) 


The  contributions  of  the  other  terms  are  immediate.  Thus  we  have 

£%+<«j)L 

=  EiSg(Xo)  —  EoSg(Xo)  -  2A  Ao(g,6g)  +  ^1  +  Ai(g,6g)  (3.9) 

We  obtain  the  integral  equation  by  introducing  the  densities  /o,  /1 ,  fo,j  —  1 . . . m.  and 
f(,j  =  1  ...to  into  (3.9)  and  setting  the  result  equal  to  0.  This  yields 


0  =  J  5j(*)|/i(*)  - /o(*)  -  2A  (1  +  ~)  [0(*)/o(*)  +  J  Ko(x,y)g{y)dy} 
"2a(i  +  ^)  [9(*)/i(*)  +  j Ki{*,y)g{y)dy\^dx 


(3.10) 
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where  if  <  is  given  by 


*<(*.»)  *  ]£[//(*.?)  +  //(y»*)]  “  (2m  +  l)/«(x)/<(y). 
i- 1 


(3.11) 


The  equation  (3.10)  holds  for  arbitrary  Sg  iff  the  expression  in  braces  is  identically  zero. 
Thus  g  most  solve  the  integral  equation 

(I <3'121 


where  r  =  (ot/oo)  and  the  kernel  L  is  given  by 


u  ,  _  (1  +  r)Ko(x,y)  +  (1  +  r-^if^y) 
(  ,y)~  (1  +  r)/o(*)  +  (1  +  r_1)/l(x) 


(3.13) 


Again  we  observe  that  A  determines  the  scaling  of  g.  Thus  the  particular  value  of  A  is 
not  significant  except  that  it  must  have  the  proper  sign  so  that  if  g  solves  the  integral 
equation  (3.12),  then  f*i(g)  >  f*o(g).  Consider  now 
[Ms)  +  a’tte)]2  =  (1  +  r)tr$(g)  +  (1  +  r_1)<r{(y) 


=  J  ff(*){[(l  +  r)/0(x)  +  (l  +  r-1)/1(r)]y(i) 

J  [(!  +  T)fto(x,y)  +  (1  +  T~l  )Ki(x,  y)]y(y)dy}dx 


(3.14) 


If  g  solves  the  integral  equation  (3.12)  for  A  =  1,  then  the  expression  in  the  braces  in 
(3.14)  reduces  to  /i(x)  -  /0(x),  and  thus  [Ms)  +  ffi(y)]2  =  Ms)  ~  Ms)  >  0-  We  shall 
therefore  assign  to  A  the  value  |  and  henceforth  consider  the  integral  equation 

*  (iKwlH(ifr'-)/,(.)  -  /  «*■«"»*■  <315> 

Furthermore,  we  observe  that  if  gi  solves  the  integral  .  qua  t  ion  (3.15)  then  Si{gi)  — 
Mi(yz)  ~  Mo(ya),  a  result  which  is  similar  to  the  one  obtained  in  Chapter  2  for  the 
optimal  value  of  S\.  We  have  proved  the  following  theorem. 
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Theorem  7.  If  the  process  {XJ  is  m-dependent  under  both  Ho  a nd  Hx,  then  a 
sufficient  condition  for  g?  to  maximise  5j  is  that  gi  solve  the  integral  equation  (3.15). 
Furthermore,  if  gj  solves  (3.15)  then  Si(gi)  =  Pi(gi)  -  Ho(9i). 

In  comparing  the  integral  equation  (3.15)  with  the  integral  equation  (2.19),  we 
note  first  of  all  that  (3.15)  is  nonlinear  because  of  the  fact  that  r  is  a  function  of  g.  Let  us 
consider  now  what  happens  when  r  varies.  If  r  is  very  small,  then  <j\  is  much  smaller  than 
(To,  and  thus  the  value  of  performance  measure  Si  is  very  close  to  that  of  the  performance 
measure  So-  In  fact,  Sj  -*  So  as  r  -»  0.  Now  observe  that  the  integral  equation  (3.15), 
when  rescaled,  converges  as  r  — ►  0  to  the  integral  equation  (2.19)  which  maximizes  the 
performance  measure  S\ .  This  provides  us  with  some  insight  to  the  relation  between  the 
performance  measures  So,  Si,  and  Si  and  the  role  that  r  plays  in  the  integral  equation 
(3.15).  We  observe,  for  example,  that  there  is  a  conflict  of  objectives  for  very  small 
r  in  that  the  value  of  the  performance  measure  Si  is  approximately  equal  to  that  of 
So,  while  the  integral  equation  (3.15)  provides  a  nonlinearity  which  is  close  to  the  one 
which  maximizes  the  performance  measure  S\ .  A  similar  conflict  occurs  n  r  approaches 
oo,  with  the  roles  of  So  and  Si  reversed.  Of  course,  there  is  no  conflict  of  objective  if 
So  *  Su  but  this  implies  that  r  »  1.  Thus  we  expect  that  r  will  have  a  “reasonable” 
value  on  the  order  of  one.  We  find  this  to  be  the  case  in  Chapter  5  where  a  numerical 
solution  to  the  integral  equation  (3.15)  is  found. 

3.3  The  Solution  of  the  Integral  Equation 

The  equation  (3.15)  is  nonlinear  because  r  is  a  function  of  g,  and  for  this  reason, 
finding  a  closed  form  solution  is  rather  difficult.  If,  however,  we  had  clairvoyance  to 
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know  the  correct  value  of  r,  then  we  could  find  the  solution  gi  by  solving  a  linear  integral 
equation.  In  fact,  we  might  try  to  guess  the  value  of  r,  find  the  solution  of  the  resulting 
linear  integral  equation,  and  then  compute  r  to  verify  if  our  guess  was  correct.  This 
suggests  an  iterative  method  where  the  computed  value  of  r  from  the  previous  solution 
becomes  the  new  value  for  r  at  the  next  iteration  of  the  procedure.  This  method  is  used 
to  obtain  a  numerical  solution  to  (3.15)  in  Chapter  5;  it  is  found  that  the  successive 
values  of  r  do  in  fact  converge. 

Although  we  cannot  find  a  closed  form  solution  to  (3.15),  we  may  treat  r  as 
a  constant  whose  value  is  unknown,  and  thereby  extend  the  analysis  relating  to  the 
equation  (3.15).  If  we  make  the  substation 

h(x )  =  p(aO\/(l  +  r)/o(*)  +  (1  +  r~l)f j(i), 


we  obtain  the  integral  equation 

AC*)  =  ■--}  ~"Y~  +  /  L\x,y)h{y)dy 
s/wT{x)  J 

where  the  symmetric  kernel  Lm  is  given  by 

Lm(x  v)  =  t1  +  r)K0(x,y)  +  (1  +  r-1)/?i(x,y) 
y/wr(x)wT(y) 

and  where  wr  is  defined  by 


u;T(t)  =  (l+r)/0(t)  +  (l  +  r-1)/1(0. 


(3.16) 


(3.17) 


(3.18) 


For  a  given  value  of  r,  the  integral  equation  (3.16)  is  a  Fredholm  equation  of  the  second 
kind,  provided  we  have  the  conditions 

[Mx)-Mx))* 


wr(x) 


(«)  / 

(4)  jj \L'iz,’j)\*dxdy 


•dx  <  oo 


<  00. 
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These  conditions  imply  that  the  solution  h  is  square  integrable,  and  then  it  follows 
that  that  E, gi(Xi)  <  oo  for  t  =  0,1.  Note  that  we  do  not  require  the  condition  that 
fo(x)/fi(x)  be  bounded  as  we  did  for  the  integral  equation  (2.19).  Condition  (a)  follows 
from  the  fact  that  \fi(x) - /o(x)|/tnT(x)  is  bounded  by  (1  +  r)-1  +  (1  +  To  show 

that  condition  (b)  holds,  it  suffices  to  show  that 


Kj{x,y) 

V roT(x)wr(y ) 


dxdy  <  oo 


(3.19) 


since  we  may  then  apply  the  Minkowski  inequality.  If  all  the  joint  densities  involved 
have  the  expansion  (2.22),  then  the  inequality  (3.19)  follows  from  a  similar  argument  for 
the  case  of  the  kernel  K{  in  Chapter  2.  For  example,  consider  the  terms  of  the  form 


/o(s,y)/o*(«,y)  <  /o(«,y)/o*(*,y) 

Wr(x)wr(y)  -  (1  +  r)2/o(x)/o(y) ' 


It  was  shown  in  Chapter  2  that  such  terms  are  integrable.  Thus  condition  (b)  holds. 
Since  the  integral  equation  (3.16)  has  a  symmetric  kernel,  the  Hilbert-Schmidt  theory 
applies  as  in  Chapter  2.  If  the  eigenvalues  and  eigenvectors  of  the  kernel  (3.17)  are 
denoted  by  {A„}  and  {<t>n},  then  a  solution  h2  of  the  integral  equation  (3.16)  has  the 
expansion 


h2(*)«h*(x) +  ££--*»(*) 


nsO 


«E 


1-A„ 


&»(*) 


(3.20) 


where 


and 


h-(x) 


fl(x)  -  fo(x) 
y/wT{x) 


=  J  hm(x)4>n(x)dx. 
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Note  that  guj(x)  +  C  solves  (3.15)  for  an  arbitrary  constant  C.  We  may  therefore  take 
an  arbitrary  value  for  either  no  or  p\.  In  fact,  with  r  fixed  the  first  two  equations  in 
(3.22)  are  linear  in  no  ud  ni  and  are  singular.  Therefore  the  system  (3.22)  does  not 
have  a  unique  solution. 


3.4  The  Performance  Measure  53 

The  performance  measures  So  and  Si  which  were  derived  in  Chapter  2  have  the 
undesirable  feature  that  they  treat  unequally  the  performance  under  the  two  hypotheses. 
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and  yet  they  are  relatively  nice  regarding  analysis  since  they  lead  to  linear  integral 
equations.  On  the  other  hand,  the  performance  measure  S j  treats  both  hypotheses 
equally  bat  leads  to  a  nonlinear  integral  equation.  We  are  led  to  consider  also  the 


% 


performance  measure 


(3.23) 


which  treats  both  hypotheses  equally  and  leads  to  a  linear  integral  equation  as  well.  This 
performance  measure  is  derived  in  this  section  by  considering  Chemoff  bounds  for  the 
error  probabilities,  and  is  actually  a  slight  modification  of  the  method  of  Sadowsky  and 
Bucklew  [3]  in  which  they  derived  the  performance  measure  Sj. 

If  the  test  statistic  Tn  has  a  normal  distribution  with  mean  nm  and  variance  nof 
then  these  are  the  Chemoff  bounds: 


P[Tn  >  n-y]  <  exp[-n/<(7)]  if  <  7 
P[Tn  <  «7]  <  exp(-n/i(7)]  if  m  >  7 


where 


*(7)  = 


0*i~7)2 
2a}  ' 


The  Chemoff  bounds  are  asymptotically  tight  in  the  sense  that 


(3.24) 


(3.25) 


lim  -- log  P[Tn  >  717]  =  /<( 7)  if  m  <  7 

n— *oo  ft 

lim  -  -  log  P[Tn  <  07)  =  Ii( 7)  if  m  >  7 

n-»oo  n 

and  they  henc"  can  provide  good  approximations  to  the  error  probabilites  if  n  is  large. 
Of  course,  if  the  distribution  of  Tn  is  only  approximately  normal,  then  the  bounds  given 
by  (3.24)  are  only  approximations  of  the  true  Chemoff  bounds.  Nevertheless,  we  shall 
proceed  under  the  assumption  that  such  approximations  are  acceptable.  From  (3.24)  we 
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see  that  if  117  is  the  threshold  and  /io  <  7  <  Mi .  then  I,  determines  (approximately)  the 
bound  for  the  error  probability  P,.  Since  a  larger  value  for  /<  results  in  a  smaller  bound 
for  Pi,  it  is  desirable  to  make  both  Jo  and  I\  as  large  as  possible.  It  is  obvious  that  /, 
is  a  convex  function  of  7  which  takes  its  minimum  value  at  7  =  Mt  •  Thus  as  7  increases 
from  Mo  to  hi,  Iq  increases  and  7j  decreases.  Sadowsky  and  Bucklew  [3]  proceeded  from 
this  point  by  maximizing  the  min(/o,/t)  and  obtained  the  result  that 


S3  =  max  min  £(7). 

#*•<”?<  Ml  «»0,1 


It  is  fairly  straightforward,  however,  to  show  that 


S3  =  max  [/0(7)  + 

MO<V<Ml 

and  this  justifies  the  use  of  performance  measure  S3. 

The  following  method  for  maximizing  S3  is  very  similar  to  the  method  in  Chapter 
2  for  maximizing  S\,  and  we  must  assume  that  the  processes  are  m-dependent.  We  know 
that  g  maximizes  S3  if  and  only  if  g  maximizes  73(p)  »  H\(g)-  no(g)-  A[e>o($)  +  crj(0)], 
where  A  is  a  Lagrange  multiplier.  The  condition  that  /(•)  vanish  at  g  leads  to  the  integral 
equation 


2Aj(*)  =  fulfill]  -  2A  /  y)9(y)dy  (3.26) 

where  the  kernel  M  is  given  by 


M(x,y) 


Kp(x,y)  +  Ki(x,y) 

M*)  +  h(x) 


The  functional  J3  is  convex  in  g,  provided  A  >  0,  so  that  the  condition  that  g  solve  the 
integral  equation  (3.26)  is  also  a  sufficient  condition  for  g  to  maximize  S3.  We  therefore 
take  A  =  J  and  the  integral  equation  (3.26)  becomes 

(3-27) 
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We  on  also  show  easily  that  if  53  solves  the  integral  equation  (3.27),  then  <^(53)  + 
=  1*1(93)  -  1*0(93)  so  that  the  optimal  value  of  S3  is  53(53)  =  01(53)  -  00(53)- 
These  results  are  summarized  in  Theorem  8. 

Theorem  8.  If  the  process  {jfi}  is  m-depeadeat  under  both  Ho  end  H\,  then  a 
sufficient  condition  for  53  to  maximize  S3' is  that  93  solve  the  integral  equation  (3.27). 
Furthermore,  if  93  solves  (3.27)  then  83(93)  =  01(53)  -  00(53). 


The  integral  equation  (3.27)  can  be  transformed  into  an  integral  equation  with 


a  symmetric  kernel  by  making  the  substitution  h(x)  =  g(x)y/f0(x)  +  f\(x)  so  that  the 
Hilbert-Schmidt  theory  applies  as  before.  We  shall  not  pursue  this  further.  We  shall, 
however,  proceed  to  find  the  optimal  nonlinearity  for  iid  processes.  If  the  processes  are 
both  iid,  then  the  kernel  M  has  the  form 

...  .  A(*)A(»)  +  /i(*)/i(») 

— mrm — 

and  the  integral  equation  gives  us  immediately  the  form  of  the  iid  solution: 


/  x  Bofo(x)  ,  Bifi(x) 

*“(I)  -  AW  +  /.W 


(3.28) 


where  Bo  =  00  —  1  and  B\  =  01  + 1.  To  find  the  unknown  constants  Bo,  Bi,  we  subtitute 
for  9iid  in  the  linear  equations 

00  =  B0  +  1  =  J  giid(x)fo(x)dx 

(3.29) 

01  =  Bx  -  1  s  j  9iid(x)h(x)dx. 

The  system  (3.29)  is  in  fact  singular,  so  that  we  may  take  either  Bo  or  B\  to  be  arbitrary. 


Therefore  we  shall  arbitrarily  take  Bo  =  0  and  this  gives  us  the  value 

1  -1 


d  |7  fo(x)fi(x) 

1  ~  [J  /o(*)  +  /i(*) 


(3.30) 
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Thu s  th«  iid  solution  has  been  determined  explicitly. 
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3.5  Extension  to  <£- Mixing  Processes 

The  results  proved  in  Section  2.4  are  also  true  for  the  performance  measures  Sj 
and  5s.  Because  of  the  similarity  of  these  two  performance  measures,  the  proof  for 
either  case  is  nearly  identical;  therefore  only  the  results  for  Sj  will  be  stated  and  proved. 
The  notation  is  as  follows:  Define  the  density  w  &  +  A)  and  let  L2(w)  be  the 

Hilbert  space  of  Borel  functions  g  such  that  /  g2(x)w(x)dx  <  oo  with  the  inner  product 
( g,h )  =  J  g(x)h(x)w(x)dx.  Note  that  g  6  L2(w)  implies  that  E<g2(Xi)  <  oo  for  i  =  0, 1. 
Let  <r2m(g)  be  defined  by  (2.28)  and  define 

-  -*>«]’.  (3J1) 

[<ro,m(g)  +  *l,m(p)] 

We  denote  by  C  the  class  of  joint  densities  which  have  a  diagonal  expansion,  as  in 
Section-2.4. 

Lemma  0.  The  functional  Sjm)  is  continuous. 

Proof.  It  was  shown  in  the  proof  of  Lemma  4  that  the  numerator  of  (3.31)  is  continuous 
and  that  <r2<m(g)  is  continuous.  Since  additions,  square  roots,  divisions,  etc.  preserve 
continuity,  it  follows  that  is  continuous.  0 

Lemma  10.  If  the  joint  densities  //,  t  =  0,1,  j  =  1,2,...,  are  in  C,  then  the 
functional  Sj  is  continuous. 

Proof.  The  proof  consists  of  showing  that  S2m*(p)  converges  to  Sj(g)  uniformly  for 
g  €  L2{ w).  Define  ft  to  be  the  class  of  all  functions  g  €  L2(w)  such  that  E <  g(X i)  =  0  and 


40 


Ej  g3(Xi)  =  1.  From  the  proof  of  Lemma  5,  it  follows  that  of  m(p)  converges  uniformly 
for  g  €  Qi-  Now  suppose  that  g  is  an  arbitrary  nonconstant  element  of  L2(w).  Then  there 
exist  constants  a,-  and  such  that  p(-  =  tug+bi  6  Q,,  and  obviously  of  m(jj)  =  a?ofm(p) 
for  all  m  and  fff (p<)  =  a?of(y).  Let  e  =  |ci/ao|.  Then  we  can  write 

<«*> 

where  (o  and  converge  to  0  uniformly  for  g  €  L2{w)  as  m  -mm.  The  righthand  side 
of  (3.32)  is  continuous  as  a  function  of  c  for  c  €  [0,oc)  and  approaches  0  as  c  approaches 
oo.  Thus  for  <o  and  (\  fixed,  the  righthand  side  attains  its  maximum  as  c  varies,  and 
this  maximum  converges  to  0  as  (o  and  approach  0.  Hence  convergence  of  5<m)  is 
uniform.  Q 

We  are  now  ready  to  prove  the  main  result. 

Theorem  11.  If  the  joint  densities  //,  i  =  0,1,  j  =  1,2,...,  are  in  C,  then  there 
exists  a  function  g?  €  L2{ w)  which  maximizes  52.  If  g[m^  solves  the  integral  equation 
(3.15),  then  the  sequence  {52(£2m))}  converges  to  52(ff2)  as  m  — *  oo. 

Proof.  Define  Q'  to  be  the  subset  of  L2(w)  consisting  of  the  vectors  g  satisfying 
J  g(x)2w(x)dx  =  1.  Since  S2  is  continuous  and  the  set  Q'  is  compact,  there  exists  an 
element  g%  of  Q'  such  that  52  achieves  its  maximum  value  on  the  set  Q'  at  g2.  If  g  is  any 
nonconstant  element  of  L2{w),  then  there  exist  constants  a  and  b  such  that  ag  +  b  6  Q'. 
But  52(jr)  =  Si(ag  +  b)  <  S,2(^2).  Thus  gj  maximizes  52 

Let  (  >  0.  By  the  proof  of  Lemma  10,  S2m)  converges  uniformly  to  S2.  Thus 
there  exists  an  integer  M  such  that  for  every  m  >  M  and  every  g  €  L2(w)  we  have 
|52m*(p)  -  Sj(p)|  <  «.  Let  m  >  Af  be  fixed.  If  S2m*(<72m))  <  S2(</2),  then  we  must  have 
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‘Sj"*)(«a)  <  5^m)(^m))  <  52($ j).  Otherwise,  we  have  S2(^m))  <  <  4m)(^m)). 

In  either  case,  |5^m)(^w))  -  52(p2)|  <  e,  and  this  implies  that  |$2(p2m>)  -  S2(p2)|  <  2e. 

a 
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CHAPTER  4 


MINIMAX  ROBUSTNESS 


4.1  The  Robust  ness  Problem 

When  the  actual  probability  distributions  of  the  observed  processes  are  precisely 
the  same  as  the  distributions  which  were  assumed  in  deriving  a  particular  test,  then 
this  is  referred  to  as  a  matched  situation.  The  performance  of  a  test  in  the  matched 
situation  is  certainly  an  important  consideration.  However,  also  of  great  importance  is 
the  performance  of  the  test  under  the  mismatched  situation,  where  the  actual  probability 
distributions  are  close  to  but  slightly  different  from  the  assumed  distributions.  If  a  given 
decision  rule  performs  relatively  well  in  the  mismatched  situation,  as  compared  to  the 
matched  situation,  then  such  a  decision  rule  is  said  to  be  robust.  It  is  the  purpose  of  this 
chapter  to  address  the  issue  of  robustness  in  relation  to  the  performance  measures  Si, 
S3  and  the  corresponding  optimal  test  statistics  as  given  in  the  preceding  chapters. 

To  begin,  we  must  first  make  more  precise  mathematically  the  discussion  of  the 
preceding  paragraph.  One  approach  to  robustness  which  has  become  very  popular  and 
which  will  be  considered  here  is  minimax  robustness,  a  game  theoretic  approach.  Define 
the  Qo  and  Q\  to  be  classes  of  distributions  which  are  possible  under  the  hypotheses 
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So  and  Si,  respectively.  These  uncertainty  classes  are  to  contain  the  “nominal”  distri¬ 
butions  (those  distributions  which  are  assumed  initially)  as  well  as  those  distributions 
which  are  only  slightly  different  from  the  nominal  distributions.  Now  define  the  least 
favorable  distributions  Fo  6  Qo  and  Ff  €  Qi  to  be  the  distibutions  which  together  with 
the  nonlinearity  gm  form  a  saddle  point  for  the  performance  measure  5  as  follows: 

S(9,FS,FT)  <  <  S(gm,F0,Fx)  (4.1) 

where  g  is  any  other  allowable  nonlinearity  and  Fq  and  Fi  are  arbitrary  distributions 
from  the  classes  Qo  and  Qi.  In  this  case  gm  is  called  a  minimax  robust  nonlinearity,  and 
has  the  property  that  for  any  pair  of  distributions  in  Qo  and  Q\ ,  the  value  of  S  evaluated 
at  g*  is  guaranteed  to  be  at  least  S(g',F£ ,F{). 

The  idea  of  this  approach  is  like  that  of  a  game  in  which  nature  chooses  the  distri¬ 
butions  Fo,  F\  out  of  the  classes  Qo,  Qi  and  the  human  player  chooses  the  nonlinearity 
g,  the  performance  measure  S(g,Fo,Fi)  being  the  payoff.  The  first  inequality  in  (4.1)  is 
usually  not  difficult  to  show.  Indeed,  if  the  least  favorable  distributions  are  known  then 
finding  the  nonlinearity  gm  is  merely  the  problem  considered  in  Chapter  2  or  Chapter  3. 
What  is  usually  more  difficult  is  finding  the  least  favorable  distributions  Fq,  F{  and 
showing  the  second  inequality  in  (4.1).  Obviously  g”,  Fq,  and  F{  solve  the  minimax 
problem 

min  msxS(g,Fo,Fi)  (4.2) 

fo.ri  3 

and  in  far*,  solving  (4.2)  is  the  simplest  way  to  find  Fq  and  F*,  if  they  exist.  That  the 
solution  g*,  Fq,  Ff  of  (4.2)  satisfies  the  left  inequality  in  (4.1)  is  obvious,  and  the  main 
task  in  proving  a  result  in  robustness  is  showing  that  such  a  solution  also  satisfies  the 
right  inequality.  Equivalently,  one  might  try  to  show  that  gm,  Fq,  F’  solve  the  maximin 
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problem 


max  min  S (g, F0,Ft)  (4.3) 

9  Ft%f i 

since  the  solution  of  (4.3)  satisfies  the  right  inequality  in  (4.1). 

If  one  defines  a  metric  on  some  class  M  of  probability  distributions,  then  a  quite 
natural  way  to  define  an  uncertainty  class  Q  about  a  nominal  distribution  F  is  to  include 
all  distributions  in  M  which  are  at  a  distance  e  or  less  from  F.  Such  a  definition  might 
be  appropriate  for  minimax  robustness  if  one  wishes  to  exploit  continuity  properties  of  a 
performance  measure,  since  by  continuity  there  will  be  a  small  change  in  the  performance 
if  the  distance  between  the  distributions  is  small.  The  e-contamination  class  which 
we  shall  consider  here  is  useful  in  a  different  sense  and  is  defined  by  Q  =  {F  :  F  = 
(1  —  e)F  +  (H,H  €  M}.  Evidently,  every  distribution  in  the  class  Q  is  a  mixture 
of  the  nominal  distribution  F  and  some  unknown  distribution  H  with  weights  (1  -  e) 
and  e.  The  corresponding  physical  interpretation  is  that  an  observation  comes  from  the 
distribution  F  with  probability  (1  -  e)  and  from  the  distribution  H  with  probability  e. 
Thus  if  F  is  a  univariate  distribution  and  the  process  is  iid,  then  out  of  n  observations, 
approximately  (1  -  e)n  will  be  from  the  distribution  F  and  approximately  en  will  be 
“corrupted”  observations  from  the  distribution  H.  Therefore  in  the  iid  case  such  an 
uncertainty  class  has  a  pleasing  physical  interpretation.  However,  if  the  process  is  not 
iid,  as  we  wish  to  assume,  then  F  is  an  n  dimensional  distribution  and  the  interpretation 
is  that  with  probability  e  the  distribution  is  completely  unknown.  This  interpretation 
is  not  particularly  desirable,  aM  we  shall  therefore  modify  this  particular  uncertainty 
model. 

Since  the  performance  measures  we  have  derived  involve  only  the  marginal  and 
bivariate  joint  densities,  we  shall  attempt  to  define  uncertainty  classes  which  involve 
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only  these  densities.  In  particular,  we  shall  assume  that  the  nominal  distribution  for  our 
uncertainty  class  Q  is  iid  with  marginal  density  /.  The  class  Q  is  then  defined  to  contain 
all  stationary  process  distributions  F  such  that  the  marginal  density  corresponding  to  F 
is  contained  in  an  (-contamination  class  about  the  nominal  /,  and  such  that  the  bivariate 
joint  distributions  satisfy  the  condition 


where  g  ranges  over  all  measurable  functions  satisfying  Eg2(Xi)  <  oo.  Since  we  assume 
8tationarity,  the  denominator  is  actually  just  Varg(Xi).  The  condition  (4.4)  is  our  way 
of  allowing  for  some  uncertainty  in  the  dependency  structure  of  the  process,  so  that  not 
every  process  in  the  class  Q  will  be  iid.  Thus  the  uncertainty  class  Q  is  specified  by 
giving  the  nominal  density  /,  the  parameter  c,  and  the  sequence  {r,}. 

In  the  analysis  that  follows,  the  least  favorable  marginal  and  bivariate  joint  dis¬ 
tributions  are  derived,  and  two  issues  regarding  these  least  favorable  distributions  must 
be  addressed.  First,  it  must  be  shown  that  there  do  in  fact  exist  stochastic  processes 
having  the  prescribed  distributions,  and  second,  it  must  be  shown  that  the  processes 
satisfy  a  mixing  condition,  so  that  the  central  limit  theory  may  be  applied  as  in  the 
preceding  chapters.  The  necessary  results  for  these  two  issues  have  been  derived  by  Sad- 
owsky  [9]  and  we  shall  adapt  them  as  needed.  For  a  fixed  marginal  distribution  function 
F,  the  least  favorable  bivariate  distribution  functions  are  given  by  (F:  being  the  joint 
distribution  function  for  Xj  and  Xj+i) 

F3(x,y)  =  {l-rj)F(x)F(y)  +  rjF(x  Ay),  j  =  1,2,...  (4.5a) 

where  *  A  y  is  the  minimum  of  x  and  y.  If  the  distribution  function  F  has  a  density  /, 
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than  we  may  write  for  the  bivariate  densities 


(4.7) 


and  let  g  be  aBorel  function  such  that  Elg(Xi)f2+s  <  oo  if  6  <  oo  or  such  that  lg(Xi)j 
is  almost  surely  bounded  if  6  =  oo.  Then  the  sum  <r2(g)  defined  in  (1.8)  converges, 
and  [Tn(X)  -  np(g)]/^/no2(g)  converges  in  distribution  to  a  standard  normai  random 
variable,  provided  <r2(g)  >  0. 

Farther  resalts  from  [9]  show  that  the  process  is  strong  mixing  and  the  condition  (4.7) 
is  satisfied  if  there  exist  constants  K  >  0  and  e  >  0  such  that  rj  <  Thus 

if  the  r-sequence  is  dominated  by  an  exponential  sequence,  the  condition  (4.7)  holds.  To 
apply  the  theorem,  then,  it  remains  only  to  show  that  E|p(jf1)|2+*  <  oo,  and  this  holds 
in  particular  if  g  is  bounded.  Thus  the  two  issues  mentioned  above  are  resolved. 

Because  the  condition  (4.4)  is  defined  in  terms  of  a  supremum  over  second-order 
functions  g ,  it  is  not  directly  obvious  whether  a  given  bivariate  distribution  satisfies  such 
a  bound.  In  order  to  determine  whether  the  bound  holds  for  a  given  bivariate  density 
f>,  it  is  useful  to  consider  the  diagonal  diagonal  expansion  (2.22),  if  it  exists.  Any  g 
which  satisfies  /  g2(x)f(x)dx  <  oo  has  an  expansion  g(x)  =  bn9n(x),  and  for  such 
an  expansion,  we  have  as  well  that 

J  92(x)f{x)dx  =  ^2b2n 

J  n 

J 1 9(x)g(y)f3(x,y)dxdy  =  £  6nan ' * 

J  9(x)f(x)dx  =  b0. 

These  expressions  imply  that 


Var  g(Xi) 

n>\ 


\Cov[9(Xx),g(Xj+l)) 
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Since  Cov[5(A'i),g(JTj+i)]/Varg(A'x)  is  invariant  under  the  scaling  of  g,  we  can  assume 
without  loss  of  generality  that  =  1,  so  that  the  denominator  of  (4.8)  is  equal 

to  1.  Then  it  is  obvious  that  the  sup  in  (4.4)  is  obtained  by  0,-  where  »  is  such  that 
|a^|  =  max{|a»*|,n  >  1}.  If  the  orthonormal  functions  {0„}  are  polynomials  (that 
is,  P  is  in  the  class  C  defined  in  Chapter  2)  then  this  maximum  coefficient  occurs  as 
either  or  a^.  To  show  this,  we  require  a  fact  from  [12]  that  for  any  such  diagonal 
expansion  in  which  the  orthonormal  functions  are  polynomials,  there  exists  a  probability 
density  function  hj  having  support  in  the  interval  [-1, 1]  such  that  a«*  =  tnh3(t)dt. 
Then  for  n  >  2  it  is  obvious  that 

kJ)|  <  Jjt\nhj(t)dt  <  £  t'hi(t)dt  =  \4j)\. 

so  that  the  assertion  holds.  Let  £„  =  /  xnf(x)dx,  (jjl  -  ff  xmynp(x,y)dxdy,  and 
a1  -  -  fj.  Then  for  this  case  and  are  given  by 

<j)  +  2^(66  -  6)cff  +  (66  ~  6)2cff  -  (f?  ~  66) 

02  ”  +  2<rHt1t3  -  6)6  +  (66  -  6)26  -  (tf  -  66) 


In  the  sections  that  follow,  we  will  consider  the  robustness  problem  (4.1)  where 
the  performance  measure  5  is  either  Si  or  S3.  We  assume  that  under  the  hypothesis  Hi, 
the  true  distribution  of  the  observed  process  is  in  the  class  <2,-,  which  is  defined  as  above 
by  the  nominal  marginal  density  /{,  the  parameter  e,,  and  an  r-sequence  which  has  the 
sum  Ri-  For  given  marginal  densities,  it  will  be  shown  that  the  bivariate  distributions 
defined  in  (4.5)  are  in  fact  least  favorable,  and  this  reduces  the  problem  (4.1)  to  one 
which  involves  only  the  marginal  densities.  We  now  give  the  least  favorable  marginal 
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densities,  which  we  will  call  the  Huber-Strassen  least  favorable  densitites.  These  axe 


,)[(! -*)£(*)  if  /i(*)//o(*)  <  «" 
P0*,“  l(l/c")(l-€o)/x(*)  if  /x(*)//o(»)  >  c" 


Pi(*) 


■{: 


(4.9) 


(1  -  <i)/i(*)  if  /i(*)//o(*)  >  C 

y(l-€i)/o(x)  if  /l(*)//o(»)  <  d 

where  the  constants  d  and  c"  are  chosen  such  that  the  functions  are  valid  probability 
densities  (i.e.  they  integrate  to  1).  The  Huber-Strassen  densities  have  appeared  fre¬ 
quently  as  the  solution  to  various  minimax  robustness  problems.  Lemma  13  is  the  basis 
for  many  such  applications. 


Lemma  IS.  For  i  —  0, 1,  let  Vi  be  the  class  of  all  probability  density  functions  of  the 
form  /  =  (1  -  ({)fi  +  f,7»,  where  fi  is  fixed  and  h  is  arbitrary,  and  let  $  be  any  convex 
function.  If  po  and  p\  are  the  Huber-Strassen  least  favorable  densities  corresponding  to 
fo  and  fi,  then  the  inequality 


Po(x)dx  < 


/ 


$  [Mil 

L/o(*)J 


fo{x)dx 


holds  for  all  marginal  densities  fa  €  Vo  end  fi  G  V\. 

Proof.  It  has  been  shown  in  [7]  that  the  least  favorable  densities  in  terms  of  risk  for 
the  classes  Vo  and  V\  are  the  Huber-Strassen  densities.  The  proof  then  follows  as  a 
corollary  to  Lemma  1  in  [15].  Q 


In  addition  to  the  e-contamination  classes,  the  Huber-Strassen  densities  are  also 
least  favorable  in  terms  of  risk  for  at  least  three  other  uncertainty  classes:  the  total 
variation  classes  [7],  bounded  classes  [17],  and  p-point  classes  [18].  Thus  Lemma  13 
holds  as  well  if  the  classes  Vo,  V\  are  both  of  one  of  these  other  three  classes. 

The  main  result  of  this  chapter  is  stated  in  the  following  theorem. 
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Theorem  14.  The  least  favorable  process  distributions  FJ,  F‘  in  the  classes  Qo,  Qi 
ate  such  that  their  marginal  densities  are  the  Huber-Strassen  densities  (4.9)  and  their 
bivariate  joint  distributions  are  defined  by  (4.5). 


4.2  Robustness  for  S\ 

hi  the  first  section  the  idee  of  minimax  robustness  was  discussed  and  the  problem 
(4.1)  posed  without  reference  to  a  particular  performance  measure.  We  are  now  ready 
to  find  the  solution  to  (4.1)  with  the  performance  measure  S  taken  to  be  Si  as  defined 
in  (2.6).  Our  first  task  is  to  show  that  for  arbitrary  but  fixed  marginal  densities,  the 
bivariate  disributions  defined  by  (4.5)  are  in  fact  least  favorable.  For  »  =  0, 1,  assume 
that  the  marginal  density  /,-  is  fixed  and  denote  by  72,  the  subset  of  Qi  containing  all 
the  distributions  Fi  which  agree  with  the  fixed  marginal  density.  Let  F“  denote  any 
such  distribution  in  72j  having  bivariate  distributions  defined  by  (4.5).  To  show  that  the 
distributions  F£  and  F*  are  least  favorable,  we  must  show  the  inequalities  (4.1).  From 
the  result  (4.6)  and  the  fact  that  Si  is  invariant  under  the  scaling  of  g,  it  is  clear  that 
the  left  inequality  is  an  equality  for  any  allowable  nonlinearity  g.  From  (4.4)  it  is  clear 
that  the  inequality 

*?($)<  (l  +  2£i)Varip(Xi)  (4.10) 

always  holds  for  any  distributions  in  the  uncertainty  class  Q\.  But  (4.6)  implies  that 
under  the  distributions  FJ  and  F*,  equality  is  obtained  in  (4.10)  for  arbitrary  g,  and 
in  particular  equality  holds  for  g *.  These  facts  imply  that  the  right  inequality  in  (4.1) 
holds.  Thus  Fq  and  F*  are  the  least  favorable  distributions  in  the  classes  72o  and  72i. 
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Now  suppose  we  define  a  new  performance  measure  S\  by 


Si(g,fo,fi) 


[M(g;/i)-M(g;/o)P 


(4.11) 


where  we  introduce  the  new  notation 

m($;  /)  =  J  g(x)f(x)dx 
<r2(g;f)  =  J g2(x)f(x)dz-  [fi(g;/)]2. 

Such  a  performance  measure  depends  only  on  the  marginal  densities.  Suppose  that  we 
find  gm,  /q  ,  ft  which  form  a  saddle  point  for  $i,  with  /£  and  /*  being  allowable  marginal 
densities  for  the  classes  Qq  and  Qx.  Thus 


§x(g,fo,fi)  <  <  §i(gaJo,fi)-  (4.12) 


Now  if  we  consider  the  performance  measure  Si,  and  gm,  F£,  F',  where  F’  is  a  distri¬ 
bution  having  marginal  density  f*  and  bivariate  distributions  defined  by  (4.5),  we  find 
that  we  have  a  saddle  point  for  the  classes  <2o  and  Q\ .  Indeed,  we  find  that 


$i(g,/o,/0  <  Sx(g-,/o,/n 
(1  +  2R\)  ~  (1  +  2i2i) 


Si(g‘,F0’,Fn. 


Furthermore,  we  have 


Si ^  Si(g%f0,fi) 
(1  +  2RX)  ~  (1  +  2  Ri) 


=  S1(g’,Fl,Fl)<Sl(g',Fo,F1) 


where  F-  denotes  the  proce's  distribution  having  fi  as  the  marginal  density  and  bivariate 
distributions  given  by  (4.5),  and  Fi  denotes  an  arbitary  process  distribution  which  has 
the  marginal  density  /<.  Our  conclusion  now  is  that  we  need  only  solve  the  problem 
involving  the  performance  measure  §\  and  the  marginal  densities;  that  is,  we  must  find 
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the  least  favorable  marginal  densities  /£  and  /*  which  together  with  gm  solve  the  problem 
(4.12).  Then  the  least  favorable  process  distributions  are  such  that  they  agree  with  the 
marginal  densities  f$,  /*  and  have  bivariate  distributions  given  by  (4.5). 

Our  method  will  be  as  follows:  First  we  find  the  solution  g*,  /£ ,  /*  to  the  minimax 
problem  (4.2),  where  /£  and  /*  are  in  the  classes  <2o  and  Q[  which  we  define  to  be  the 
classes  of  all  marginal  densities  which  are  derived  from  the  classes  Qo,  Q\.  Recall  that  Q[ 
is  an  e-contamination  class  with  nominal  univariate  density  /<  and  parameter  £<.  If  the 
problem  (4.1)  has  a  solution,  then  necessarily  it  must  be  </*,  /0*,  ft.  Thus  at  this  point 
we  have  likely  candidates  for  the  robust  nonlinearity  and  the  least  favorable  densities. 
The  second  step  is  to  show  that  the  right  inequality  in  (4.1)  is  satisfied;  that  is,  we  must 
show  that 

=  inf  W,/o,/i).  (4.13) 

•  /o»/i 

F6r  given  marginals  /o,  f\,  the  optimal  nonlinearity  g  is  given  by  the  solution  of 
the  integral  equation  (2.19)  with  m  =  0,  and  it  is  easily  verified  that  a  solution  is  given 
by  p(x)  =  -[/o(x)//x(i)J.  By  Theorem  3,  we  have  that 

Si(S,/o,/i)  =  f  <K*)[/i(»)  -/o(*)]dx  =  j  ^^d*-l  (4-14) 

Thus  for  the  first  step,  solving  (4.2),  we  must  minimize  the  rightmost  integral  in  (4.14). 
Lemma  13  applies  with  $(*)  =  x-1,  which  is  convex.  Thus  our  candidates  for  the 
least  favorable  marginal  densities  are  the  Huber-Strassen  densities  corresponding  to  the 
nominals  /o  and  f\. 

We  must  now  show  that  the  right  inequality  in  (4.1)  is  satisfied  by  g'  =  -(/o*7/f)> 
where  ft  and  ft  are  the  Huber-Strassen  densities.  A  complete  proof  of  this  result  has 
been  published  in  [16],  and  therefore  only  a  sketch  of  the  proof  will  be  given  here. 
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Lemma  15,  whose  proof  is  gives  in  {13],  will  be  used  here  as  well  as  in  the  next  section 
to  show  that  certain  functions  are  convex. 

Lemma  15.  Uvi  >  0,  vj  >  0,  and  0  <  a  <  1  then 

<a£+(1_a)£ 

By  virtue  of  Lemma  15,  the  performance  measure  §i  is  convex  in  the  densities 
/o>  fx  for  fixed  g.  This  implies  that  the  function  J(o;/o,/t)  =  Si  [p*,(l  -  «)/o  +  af0, 
(1  —  «)/•  +  afi)  is  convex  in  a.  Furthermore,  a  necessary  and  sufficient  condition  for 
(4.13)  to  hold  is  that  ^  0  for  arbitrary  /o,  f\.  However,  it  is  also  true 

from  considering  (4.14)  that  and  /,*  minimize  the  functional  T[/0,/i]  =  /(/o//i)- 
Since  T  is  also  convex  in  /o,  fi,  we  must  have  also  the  condition  that  ^T[(l  -  a)f£  + 
o/0,(l  —  a)fi  +  afi j o-0  >  0.  By  considering  these  two  derivatives,  it  can  be  shown 
that  the  condition  that  /£,  }*  minimize  T  is  equivalent  to  the  condition  that  they 
minimize  £i(p“,/o,/i)-  A  similar  proof  is  given  in  greater  detail  in  the  next  section  for 
the  performance  measure  S3. 

4.3  Robustness  for  S3 

We  will  obtain  in  this  section  essentially  the  same  result  for  the  performance 
measure  S3  as  that  obtained  in  the  preceding  section  for  the  performance  measure  Si. 
The  first  task  is  to  show  that  the  problem  of  finding  the  least  favorable  distributions  again 
reduces  to  a  problem  involving  only  the  marginal  densities.  Define  a  new  performance 
measure 

•Safai/o./i)  = 


fc(g;/i)-Mg;/o)1 
ff2(s;/o)  + AcrJ(<r,/i)’ 


(4.15) 


[own  +  (1  —  a)tta] 
owi  +  (1  -  a)uj 
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where  A  —  [(1  +  2JZi)/(l  +  2JZo)].  Such  a  performance  measure  depends  only  on  the 
marginal  distributions  /o,  f\.  If  io,  F\  are  process  distributions  which  have  marginal 
distributions  /o,  f\  and  bivariate  distributions  defined  by  (4.5),  then  we  have  the  relation 

e>  t  _  n  n  \  /til  fi) 

A  series  of  equalities  and  inequalities  similar  to  those  at  the  beginning  of  Section  4.2 
can  be  used  to  show  that  the  problem  (4.1)  for  the  performance  measure  S3  reduces 
to  the  problem  involving  only  the  performance  measure  S3.  Thus  if  one  finds  the  least 
favorable  marginal  distributions  /£ ,  /*  and  the  minimax  robust  nonlinearity  gm  for  the 
performance  measure  S3,  then  the  minimax  problem  for  S3  is  solved  by  taking  the  least 
favorable  process  distributions  to  be  such  that  the  marginal  distributions  are  /q  ,  /*  and 
the  bivariate  distributions  are  given  by  (4.5). 

The  integral  equation  which  yields  the  optimal  nonlinearity  for  53  is  similar  to 


(3.27)  with  m  =  0  except  for  the  coefficient  A: 


,.rn  /i(*)~/o(*)  ,  f  (7o(*)/o(y)  +  A/l(*)/1(y)l 

9W  ■  /„(.)+ i/iiT) + j  l  J  5i,)dv-  (4-16) 


We  have  immediately  the  form  of  the  solution 


/  v  Bofo(x)  -f  Bifi(x) 
9K  )~  M*)  +  AMx) 


(4.17) 


where  Bo  =  po  —  1  and  B\  =  Am  +  1.  If  we  consider  the  linear  system  of  equations 


Ho  =  B0  +  1  =  J  g(x)f0(x)dx 
Pi  =  -  1)  =  J  g(x)ft{x)dx 


with  Bo,  B\  as  the  unknowns,  then  we  find  that  the  system  is  singular,  and  consequently 


we  may  assign  to  Bo  the  arbtrary  value  0.  This  implies  that 


a  _  Mx)fl(x)  .l~l 
''[J  fo(x)  +  Ah(x)dx\  • 


(4.18) 
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If  g  is  the  optimal  nonlinearity  which  is  matched  to  /0,  f\,  then  we  know  that 


§3(9,  f (»»/i)  =  j 9(X)[M*)  -  M*)]dx 

r  f<x\  (4‘19) 

-  Bi(foJi)J  /o(l)  l+  A/i(l)  [A(*)  -  /o(x)]d*. 

where  we  have  written  5i  as  a  function  of  /0  and  f\  to  remind  us  of  the  relation  (4.18). 
Lemma  13  applies  to  the  integral  in  (4.19)  with  ¥ (*)  =  x(x  -  l)/{x  +  1),  which  is  convex, 
so  that  the  integral  in  (4.19)  is  minimized  by  the  Huber-Strassen  densities.  Lemma  13 
also  applies  to  the  integral  in  (4.18).  In  this  case  V(x)  =  x/(Ax  +  l),  which  is  concave,  so 
that  by  applying  the  lemma  to  the  negative  of  the  integral  (since  is  convex)  we  find 
that  this  integral  is  maximized  by  the  Huber-Strassen  densities.  5i(/o,/i)  therefore 
is  minimized.  Thus  our  candidates  for  the  least  favorable  marginal  densities  are  the 
Huber-Strassen  densities.  The  right  inequality  in  (4.1)  will  now  be  proved. 

The  following  inequalities,  which  depend  on  the  fact  that  er2(g;f)  is  concave  in 
/  and  on  Lemma  15,  demonstrate  that  53(5, /o,/i)  is  convex  in  f0  and  /j  for  fixed  g. 
With  0  =  (1  -  a)  we  have 


83(9,0/0  +  afo,Pfi  +  afo)  = 


[Mg;ff/i  +  <*fi)  -  n(g;Pfo  +  o/o)] 
a2(g;0fo  +  afo)  +  A<r2(g\0fx  +  afx) 


£{Mg;  f\)  -  Mg;  /o)}  +  a  {Mg;  h )  -  Mg;  /o)} 


0[a2(g;fo)  + Aa2(g;fx)]  +  a[cr2(g;  f0)  +  Aa2(g;  fx)] 
£  083(9,  fo,/i)  +  083(3 , /o, /1 ) 


Define  the  function 


J(a\  foJi)  —  S3 [g  ,  (1  —  a)  ft  •+■  afo, (1  —  a)ft  +  0/1]  0  <  a  <  1 

where  ft  and  ft  axe  the  Huber-Strassen  least  favorable  densities  and  gm  is  the  optimal 
nonlinearity  matched  to  ft,  ft.  Certainly  J  is  convex  in  o  if  S3  is  convex  in  /0  and  fx. 
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Now  the  right  inequality  in  (4.1)  hoida  if  and  only  if 


>  -f(0;/o,/i)  (4.20) 

for  all  or  in  the  interval  [0, 1],  and  since  J  is  convex  in  a,  (4.20)  holds  if  and  only  if  we 
have  the  condition 

£^(a;/o,/i)|a,o>0.  (4.21) 

If  we  take  the  derivative  of  J(a;  fo,fi)  and  set  a  =  0.  Then  we  have 


£./(<*; /o,/i)|a-#  =  2  J  g'U r  -  f0  -  A*  +  /„’)  -J  (ff*)2(/0  +  Aft)  +  J  (gm)2(fo  +  A/*) 

+  2 if  9'fo]  [ / 9'ih  -  /o*)]  +2 A[J g'fi]  [ J g'ih  -  /(•)] 

=  2Bt[J  g'fi-Jg-K]  +  J  (gm)2(fo  +  Afi)  -  |(s*)2(/o  +  Aft). 

(4.22) 

We  can  now  show  that  (4.21)  holds  by  considering  the  function 

«-23> 

which  by  Lemma  13  is  minimized  by  the  Huber-Strassen  densities.  Define 

K(aifo>fi)  —  ^[(1  ~  a)/o  +  or/o,(l  —  a)/f  +  a/i]. 

It  follows  from  Lemma  15  that  T  is  convex  in  /o,  ft  •  By  the  same  reasoning  as  before, 
then,  we  conclude  that  /J ,  /*  minimize  T  if  and  only  if 

±K(a;f0Jt)  |„0>0.  (4.24) 
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The  final  step  in  oar  proof  is  to  show  that  the  inequality  (4.24)  implies  the  in¬ 
equality  (4.21).  Define  pL**  =  (1  -  a)/*  +  a/i  for  i  =  0,1  and  0  <  a  <  1,  so  that  we 


have 


and  thus 


A)  =  /  -(i'’- 'j’i 

J  P{*  +  Ap! 


^(a;/o,/x)L,0=U m/i  -gjfc 


(P^)2  (p^)2 

.(»)  " 


p^+Ap^  Pr+AP<o)j 


(4.25) 


The  derivative  of  the  integrand  in  (4.25)  is 


d  (Pi)3 

da  Lp&  +  ApiJart) 


JL 


fs + A/r 


»(/i 


[(/o  +  A/*)  -  (/0  +  A/i)] . 


To  differentiate  if  we  most  justify  the  interchanging  of  the  integration  and  differentiation 
operations.  The  convexity  of  if  as  a  function  of  a  implies  the  inequalities 


-  v+  ( jrftjf)’  R*  +  A/'>  -  <* +  A/l» 


<1 


(PL1})2 


(p^)2 


«  Lp«0)  +  ApL1} 

(P^)2 


Po0)  +  Apq2> 

i^)2 


Pi0)  +  Api 


(i) 


P(o0)  +  Apl" 


(4.26) 


The  right  quantity  in  (4.26)  is  integrable,  and  the  middle  quantity  converges  pointwise 
monotonically  to  the  left  quantity  as  a  -*■  0  because  of  the  convexity  of  K.  The  mono¬ 
tone  convergence  theorem  then  permits  the  interchange  of  the  differentiation  and  the 
integration,  and  we  have 

£*(«;/„,  A)L  = 

/  {2Ffe (/l  -  '•> +  Gff&r  )’ [(/;  +  A/l‘)  - 1/0  +  '4/l)| ) '  <427) 
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Now  if  we  compare  equations  (4.22)  and  (4.27),  then  we  see  that  (compare  (4.17)) 

and  thus  conditions  (4.21)  and  (4.24)  are  equivalent. 
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CHAPTER  5 


NUMERICAL  RESULTS  AND  CONCLUSION 


5.1  Description  of  the  Examples 

In  the  work  of  the  preceding  chapters,  we  attempted  to  justify  the  use  of  the  vari¬ 
ous  performance  measures  by  showing  that  a  large  value  of  a  performance  measure  results 
good  performance  as  determined  by  the  actual  error  probabilities.  Such  an  approach  is 
necessary  since  the  performance  measures  are  mathematically  tractable,  whereas  the  er¬ 
ror  probabilities  themselves  are  not.  The  error  probabilities,  however,  can  be  estimated 
by  simulation  on  a  digital  computer.  Such  simulation  results,  presented  in  this  chapter 
for  several  different  examples,  will  complete  this  work. 

There  are  two  questions  concerning  which  we  might  like  to  gain  some  insight  as  a 
result  of  these  computer  simulations.  First,  and  perhaps  foremost,  is  the  question  about 
the  validity  of  the  assumptions  which  were  made  in  justifying  the  various  performance 
measures.  In  particular,  we  assumed  that  *ke  distribution  of  the  test  statistic  was  ap¬ 
proximately  Gaussian,  and  in  fact,  under  the  hypotheses  of  Theorem  1  or  Theorem  12 
the  distribution  of  the  normalized  test  statistic  converges  to  a  Gaussian  distribution  as 
the  sample  size  approaches  infinity.  Our  tests  shall  have  finite  sample  sizes,  however, 
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and  therefore  the  effect  of  the  finite  sample  size  should  be  examined.  The  second  ques¬ 
tion  which  warrants  our  attention  is  of  a  philosophical  nature.  The  processes  between 
which  we  wish  to  discriminate  are  dependent,  and  thus  they  necessarily  involve  memory. 
However,  the  tests  with  which  we  wish  to  perform  such  discrimination  are  memoryless, 
and  therefore  it  is  not  clear  to  the  intuition  that  such  a  scheme  can  work  effectively.  In 
the  case  of  mixing  processes,  where  there  is  “asymptotic  independence,”  we  know  that 
memoryless  discrimination  is  possible,  and  that  the  performance  will  improve  as  the 
sample  size  increases.  Our  concern  here  should  be  the  improvement  of  a  given  memo¬ 
ryless  discriminator  over  the  discriminator  which  is  designed  under  the  assumption  that 
the  processes  are  iid  (i.e.  the  LBT  (1.2)). 

In  all  of  the  examples  which  are  presented  here,  the  marginal  densities  will  be 
the  same  throughout,  and  the  varying  parameters  will  be  the  time  constants  of  the 
dependency  lengths  and  the  sample  sizes  of  the  tests.  We  assume  that  the  process 
has  a  Rayleigh  distribution  with  parameter  N4  under  hypothesis  Ho  and  a  lognormal 
distribution  with  parameters  \i  =  0.8,  Aj  =  0.25  under  hypothesis  Hi .  The  Rayleigh  and 
lognormal  densities  are  given  in  Appendices  A  and  B,  respectively.  These  distributions 
are  have  found  application  in  radar  discrimination  problems,  and  indeed  such  has  been 
the  motivation  behind  this  research.  The  n-dimensional  Rayleigh  density  is  such  that  it 
generally  lacks  a  closed  form  expression  when  n  >  2,  and  thus  an  LRT  is  not  feasible. 
The  parameters  pj  which  appear  in  the  expressions  for  both  the  Rayleigh  bivariate 
densities  (A2)  and  the  lognormal  bivariate  densities  (B4)  are  actually  the  correlation 
coefficients  of  the  underlying  Gaussian  process(es).  In  each  of  the  examples  of  this 
chapter,  we  shall  assume  that  the  values  of  the  pj  parameters  are  given  by  exponentially 
decaying  sequences  which  are  determined  by  a  time  constant  rt.  Thus  under  hypothesis 
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Hi,  we  have  pj  =  exp(— j/n),  where  Pj  is  the  parameter  in  the  density  //.  The  time 
constants  will  be  varied  in  the  different  examples  to  reveal  the  effects  of  varying  degrees 
of  dependency  on  the  various  test  statistics. 

Several  comments  concerning  the  choices  for  the  aforementioned  parameters  are  in 
order.  First,  the  parameters  for  the  marginal  densities  were  chosen  to  match  as  closely  as 
possible  the  two  densities  involved.  More  precisely,  the  parameters  are  such  that  E0  X\  = 
Ei  X\,  and  Eo  Xl  =  Ei  that  is,  the  first  and  second  moments  agree.  The  graphs 
of  the  two  marginal  densities  can  be  observed  in  Figures  1  and  2.  While  observing  the 
linear  plots  in  Figure  1,  it  seems  that  this  is  a  relatively  difficult  discrimination  problem; 
however,  logarithmic  plots  in  Figure  2  reveal  that  there  is  a  great  deal  of  discrimination 
capability  in  the  tail  regions,  the  Rayleigh  density  fo  having  a  much  heavier  tail  to  the 
left  and  the  lognormal  density  f\  having  a  heavier  tail  to  the  right.  Second,  by  taking 
the  sequences  of  p- parameters  to  be  exponential,  sequences,  the  underlying  Gaussian 
processes  become  Markov  processes,  and  thus  methods  for  generating  the  processes  on 
a  computer  become  relatively  simple. 

As  mentioned  above,  the  parameters  for  the  marginal  densities  shall  remain  the 
same  for  each  of  the  specific  examples  considered.  The  time  constants,  however,  will  be 
varied  in  the  different  examples.  We  shall  assign  a  label  Ei  to  each  of  the  examples  for 
easy  reference.  Table  1  lists  the  parameters  for  each  of  the  examples. 


5.2  The  Calculation  of  the  Nonlinearities 

For  each  of  the  examples  Ei,...,Es,  we  shall  compare  the  performance  of  five 
different  nonlinearities  <7,, i  =  0,...,4.  We  denote  by  gi,  for  0  <  i  <  3,  the  optimal 
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Table  1.  Parameters  for  the  examples. 


Example 

Tb 

mo 

mx 

n 

Ei 

13.0288 

13.0288 

60 

60 

1000 

E, 

13.0288 

130.288 

60 

600 

1000 

£3 

130.288 

13.0288 

600 

60 

1000 

E< 

130.288 

130.288 

600 

600 

1000 

Es 

13.0288 

130.288 

60 

600 

100 

nonlinearity  for  the  performance  measure  5<,  which  is  consistent  with  our  usage  in  the 
preceding  chapters.  We  also  denote  by  g+  the  optimal  iid  nonlinearity  given  by  — 
log(/i//o)  (cf.  (1.2)).  Thus  we  have  also  live  different  test  statistics  T{  —  9i(xk), 

t  =  0, . . . ,  4.  The  nonlinearity  <74  is  computed  easily  since  it  ha»  a  closed  form  solution. 
To  obtain  the  others,  the  corresponding  integral  equations  from  Chapters  2  and  3  must 
be  solved. 

Several  issues  must  be  considered  in  the  numerical  calculation  of  the  nonlinearities. 
First,  because  the  integral  equations  are  derived  for  m-dependent  processes,  we  must 
assign  a  value  to  m.  Our  criterion  for  doing  so  is  to  select  m  so  that  pm  <  pmin.  Thus 
we  have  two  values  mo,  mi  corresponding  to  the  processes  under  the  two  hypotheses  Ho , 
Hi.  In  the  results  presented  here,  we  have  pm j„  =  0.01.  These  results  were  tested  by 
decreasing  the  value  of  pmio ,  or  equivalently,  increasing  the  value  of  m*  and  it  was  found 
that  the  numerical  results  were  unchanged,  thus  corroborating  Theorems  6  and  11.  The 
second  issue  is  the  choice  of  a  finite  interval  [xmin,Zm*xI  over  which  the  integration  is  to 
be  performed.  This  amounts  to  truncating  the  densities,  and  is  of  a  special  concern  for 
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the  nonlinearities  go  and  gi  since  /o(*)//i(*)  is  unbounded  as  x  —  oo  and  fi(x)/f0(x) 
is  unbounded  as  *  -*  0.  Thus  for  these  cases  the  absolute  term  in  the  integral  equation 
does  not  have  a  finite  second  moment,  and  it  is  therefore  manditory  that  the  tails  of 
the  densities  be  modified  or  truncated  for  the  problem  to  be  well  defined  as  a  Fredholm 
equation.  In  other  words,  the  condition  (a)  at  the  beginning  of  Section  2.3  is  not  satisfied 
tmlom  the  tails  of  the  densities  are  modified  by  truncation  or  some  other  method.  For 
the  problems  here,  the  interval  was  chosen  so  that  under  either  hypothesis 

P{Xi  <  *mi«}  <  «  and  P{X  1  >  <  «,  where  c  =  5  x  10-5.  This  resulted  in 

x-i-  =  0.02  and  =  15.7. 

The  most  direct  method  for  solving  a  Fredholm  equation  is  to  approximate  the 
integral  with  a  numerical  quadrature  formula,  and  thereby  transform  the  problem  into 
a  system  of  linear  equations.  In  the  method  used  here  the  quadrature  formula  was  a 
composite  Simpson’s  rule  with  N  —  301  nodes.  The  argument  goes  as  follows.  First 
approximate  the  integral  by  the  weighted  sum  thus: 

g(x)  «  K{x,xi)g(xi)wi.  (5.1) 

/i(*) 


We  find  our  numerical  solution  by  solving  the  N  linear  equations 
Uj  =  +^K(xj,xi)uiwi 


N 


(5.2) 


tal 


for  the  N  unknowns  iij,i  »  If  the  numerical  integration  in  (5.1)  is  reasonably 

a- 'urate  for  all  the  x  values  in  the  interval  [xmjn,Xm*x],  then  p(x<)  will  solve  a  linear 
system  of  equations  similar  to  those  in  (5.2)  but  with  slightly  perturbed  coefficients. 
Therefore,  provided  the  coefficient  matrix  is  not  ill-conditioned,  the  solution  of  (5.2) 
will  give  a  reasonably  approximate  solution  to  the  integral  equation.  In  fact,  Fredholm 

65 


actually  proved  that  the  such  approximations  converge  to  the  solution  as  N  approaches 
infinity.  Obtaining  a  numerical  solution  for  the  integral  equation  (3.15)  requires  a  little 
more  effort,  in  that  one  must  solve  the  linear  equations  for  several  different  values  of 
r  until  the  sequence  of  r  values  appears  to  be  near  its  limit.  In  the  problems  solved 
here,  the  initial  value  was  r  =  1  and  convergence  to  within  10~4  occurred  after  10-23 
iterations  for  the  four  examples.  For  any  of  the  integral  equations,  if  the  values  of  mo 
and  mi  are  large,  then  typically  the  vast  portion  of  CPU  time  is  spent  in  initializing 
the  matrix  for  the  linear  system  because  of  the  sums  in  the  kernels.  For  the  integral 
equation  (3.15),  where  the  linear  system  must  be  solved  several  times,  it  is  economical 
to  save  the  values  of  the  sums. 

Shown  in  Figures  3-6  are  the  graphs  of  the  numerically  computed  nonlinearities 
for  the  problem  £3.  Figure  3  displays  the  nonlinearities  g0  and  94,  while  Figure  4,  which 
is  drawn  to  a  much  smaller  scale  than  Figure  3,  displays  the  nonlinearities  gi,  g%,  and 
5T3-  Figures  5  and  6  are  semilogarithmic  plots  which  show  the  right  and  left  tails  of 
the  nonlinearities.  The  tail  behavior  is  of  concern  because,  as  we  noted  by  observing 
Figure  2,  this  is  where  most  of  the  discrimination  capability  lies.  We  might  try  to  predict 
the  performance  of  each  of  the  test  statistics  by  observing  the  shapes  of  the  corresponding 
nonlinearities.  For  go,  the  heavy  tail  to  the  right  will  cause  a  separation  of  the  means 
710(50)  and  Mi(flo)  at  the  expense  of  making  <r\(go)  rather  large.  Of  course,  crl(gQ)  will 
not  be  effected  in  a  serious  way  by  the  right  tail  because,  as  can  be  seen  in  Figure  2,  fo 
places  little  masr  in  *vat  region.  We  notice  the  reverse  situation  for  g\,  where  the  heavy 
tail  on  the  left  should  create  a  separation  of  Mo(gi)  and  711(01)  at  the  expense  of  making 
cr^(pj)  large.  Because  <74  has  heavy  tails  on  both  the  left  and  the  right,  we  would  expect 
711(04) -7*0(54)  to  be  large,  as  well  as  <70(54)  and  <7^(04).  Finally,  we  note  that  g2  and  g3 
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Figure  4.  Linear  graph*  of  the  nonlinearitie*  9\,  9i,  and  <73  for  Ex¬ 
ample  Ej. 
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Table  2.  Value#  of  fti  and  <7<  evaluated  at  the  each  of  the  nonlinearities  for 
Example  £3. 


!*o 

<*o 

Ml 

<*\ 

9o 

-2.9643e— 01 

3.1015e+01 

9.6166e+02 

9.3037e+05 

9i 

-1.2197e+09 

1.3087e+ll 

9.1490e— 06 

3.4924e+04 

9i 

-4.9429e-03 

5.3269e— 02 

5.6676e— 07 

1.7041e— 02 

9o 

— 8.7036e— 03 

7.9951e-02 

1.7709e— 06 

4.8095e— 02 

94 

— 1.6660e— 01 

1.4059e+00 

8.0028e— 02 

5.7433e+00 

do  not  have  heavy  tail*  to  either  the  left  or  right,  and  thus  we  expect  px  —pQ  to  be  small  as 
well  as  <Tq  and  a\  for  each  case.  Table  2,  which  lists  the  values  of  these  moments  for  each 
of  the  nonlinearities  for  example  £2,  shows  that  such  predictions  are  accurate.  One  final 
comment  concerning  the  shapes  of  the  nonlinearities  is  worth  mentioning.  Because  of  the 
lopsided  nature  of  the  nonlinearities  go  and  gx,  the  distributions  of  go(Xx)  and 
will  be  skewed  to  the  right  and  left,  respectively.  Thus  convergence  of  the  sums  To  and 
7i  to  a  Gaussian  distribution  will  be  slow.  On  the  other  hand,  because  the  nonlinearities 
&  and  03  are  relatively  small  in  magnitude  and  “balanced,”  the  convergence  of  T2  and 
T3  to  a  Gaussian  should  be  more  rapid.  The  heavy  tails  on  both  the  left  and  right  of  gA 
should  cause  g^Xi)  to  be  skewed  to  the  left  under  Ho  and  skewed  to  the  right  under 
H\.  Thus  convergence  of  TA  to  a  Gaussian  distribution  should  also  be  rather  slow.  The 
same  general  phenomena  occur  for  the  other  examples  as  well. 

In  Tables  3-6  are  listed  the  values  of  the  performance  measures  evaluated  at  each 
of  the  nonlinearities.  We  observe  for  each  case  that  the  numerical  solutions  are  consistent 
with  the  goal  that  gx  maximize  Si  for  i  -  0, 1,2,3. 
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Table  3.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin¬ 
earities  for  Example  E\. 


So 

Si 

s3 

53 

9n 

9.6196e+02 

1.0008e— 05 

l.OOOOe— 05 

1.0008e— 05 

9i 

8.6869e— 05 

7.0651e+10 

8.6869e— 05 

8.6869e— 05 

97 

2.2580e— 02 

9.3601e— 02 

1.0155e— 02 

1.8191e— 02 

93 

3.0055e— 02 

5.4255e— 02 

9.8783e— 03 

1.9341e— 02 

93 

3.0775e-02 

2.4026e— 02 

6.7720e— 03 

1.3493e— 02 

Table  4.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin¬ 
earities  for  Examples  Ej  and  E$. 


So 

Si 

5j 

53 

9o 

9.6196e+02 

1.0690e— 06 

1.0690e— 06 

1.0690e— 06 

9\ 

8.6869e— 05 

1.2197e+09 

8.6869e— 05 

8.6869e— 05 

93 

8.6121e— 03 

8.4155e— 02 

4.9434e— 03 

7.8126e— 03 

93 

1.1856e-02 

3.2762e-02 

4.6221e— 03 

8.7054e— 03 

94 

3.0775e— 02 

1.844 Oe— 03 

1.1901e-03 

1.7397e— 03 

Table  5.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin- 
earities  for  Example  £3. 


So 

Si 

5, 

53 

ffo 

1.7417e+02 

4.9968e-06 

4.9951e-06 

4.9968e— 06 

9\ 

8.2661e— 05 

7.0651e+10 

8.2661e-05 

8.2661e— 03 

91 

5.5079e— 03 

2.8253e— 02 

2.6506e— 03 

4.6093e— 03 

93 

6.5927e— 03 

1.8214e— 02 

2.5700e— 03 

4.8406e—03 

93 

4.4120e— 03 

2.4026e— 02 

2.1620e— 03 

3.7275e— 03 

Table  6.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin¬ 
earities  for  Example  £4. 

So 

Si 

S3 

9o 

1.7417e+02 

8.9601e— 07 

8.9588e— 07 

8.9601e— 07 

9l 

8.2661e-05 

1.2197e+9 

8.2661e— 05 

8.2661e— 05 

9i 

1.8637e— 03 

6.0845e— 02 

1.3499e— 03 

1.8083e— 03 

93 

3.0956e— 03 

8.4433e— 03 

1.2010e— 03 

2.2651e— 03 

9s 


4.4120e— 03 


1.8440e— 03 


6.8021e— 04 


1.3005e— 03 


5.S  Simulation  Result* 


Figures  7-11  contain  the  graphs  of  the  receiver  operating  characteristic  curves 
for  each  of  the  examples.  These  were  generated  by  tabulating  the  results  of  10,000 
simulations  under  each  hypothesis.  Since  the  values  of  the  nonlinearities  were  computed 
for  only  those  values  in  the  interval  [x^i-.x,^,.].  a  method  had  to  be  chosen  to  deal  with 
those  observations  outside  the  interval,  and  this  was  resolved  by  limiting  the  observations, 
so  that  a  value  outside  the  interval  was  reset  to  z^  or  Xmu,  whichever  was  the  closest. 
The  reasoning  behind  such  a  method  is  that  real  world  observations  are  in  fact  limited 
since  the  instruments  which  make  the  measurements  are  limited.  The  endpoints  zmia 

and  x _ were  selected  so  that  the  probability  of  an  observation  being  larger  than  xmM, 

for  example,  would  be  less  than  5  x  10~s,  and  the  actual  proportion  of  observations 
which  occured  beyond  or  below  x^,,  in  the  simulations  proved  to  be  consistent 
with  this  probability.  Thus  the  effect  of  this  limiting  is  practically  negligible. 

The  ROCs  in  Figures  7-11  are  plots  of  the  error  probability  Pi  versus  the  error 
probability  Po  on  logarithmic  scales.  Our  main  concern  shall  be  the  minimax  point  of 
the  ROC,  or  that  point  where  Po  =  Pi .  This  region  occurs  along  the  diagonal  which 
extends  fro m  the  lower  left  corner  to  the  upper  right  corner  of  the  graph  and  gives  us  an 
ordering  of  the  nonlinearities.  The  approximate  values  of  Pt  (and  hence  also  Po)  at  the 
minimax  point  are  listed  in  Table  7.  From  Figure  7,  which  corresponds  to  the  example 
Ei,  we  see  that  the  iid  nonlinearity  g+  performs  uniformly  better  than  the  others,  which 
is  to  be  expected  because  the  dependency  under  either  hypothesis  is  relatively  weak. 
The  ordering,  from  best  to  worst,  continues  with  g3,  g3,  g\,  and  finally  g0. 

Figure  8  corresponds  to  E3,  were  there  is  a  relatively  strong  dependency  under 


Table  7.  Approximate  values  of  Pq  (and  Pi)  at  the  minimax  region  of  the 
ROCs  for  each  of  the  nonlinearities  as  estimated  through  computer  simulation. 


Example 

go 

gi 

gt 

gz 

g< 

Ei 

4.1e— 03 

2.2e— 03 

l.le-03 

9.0e— 04 

7.0e—04 

Ei 

l.le-01 

2.6e— 03 

4.3e— 03 

6.5e— 03 

3.5e— 02 

Ei 

4.0e— 02 

2.8e— 02 

2.4e-02 

2.8e— 02 

2.3e—02 

Ea 

1.8e-01 

4.9e— 02 

4.4e— 02 

6.9e— 02 

9.8e— 02 

Ez 

3.7e— 01 

1.6e— 01 

1.7e— 01 

2.4e— 01 

1.7e-01 

the  hypothesis  Hi.  The  ordering  is  g\ ,  gi ,  <73,  gi,  and  go,  a  result  which  is  rather  pleasing 
to  the  intuition.  Since  there  is  memory  in  the  observations,  the  nonlinearity  g*  which  is 
designed  under  a  no  memory  assumption  performs  relatively  poorly.  The  nonlinearity  gi, 
however,  was  designed  to  minimize  of,  which  essentially  captures  all  of  the  dependency 
under  H\.  Thus  g\  achieves  its  relatively  good  performance  by  minimizing  the  effects  of 
the  dependency,  and  this  may  perhaps  be  the  only  way  to  handle  dependency  when  a 
memoryless  discriminator  is  to  be  used. 

With  this  concept  in  mind,  we  now  examine  Figure  9,  corresponding  to  E3  in 
which  the  dependency  under  Ho  is  relatively  strong.  Although  we  would  expect  a  reverse 
of  the  situation  of  E%,  we  find  again  that  the  ordering  at  the  minimax  point  is  g+,  gi, 
gz,  gi,  go,  which  is  like  that  of  Ei  except  that  the  positions  of  gi  and  go  are  reversed. 
There  is  not  a  larj?  difference  in  the  performance  from  best  to  worst,  and  in  fact  gi  and 
gz  are  actually  tied  for  the  third  position.  As  we  proceed  to  the  left  of  the  curves  from 
the  minimax  region,  we  find  that  there  is  a  region  where  <71  performs  best,  and  finally  a 
region  where  go  performs  best.  If  we  examine  the  situation  a  little  more  carefully,  we  may 
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also  have  an  intuitively  pleasing  explanation  for  this  result.  The  correlation  with  which 
we  are  actually  dealing  is  that  of  the  underlying  Gaussian  process(es).  The  ^parameters 
in  the  bivariate  densities  in  the  appendices  are  the  actual  correlation  coefficients  of  the 
Gaussian  processes,  but  are  related  to  the  correlation  coefficients  of  the  Rayleigh  and 
lognormal  processes  in  a  one-to-one  manner.  Denote  by  pr(p)  the  correlation  coefficient 
of  the  Rayleigh  density  as  a  function  of  the  parameter  p  from  the  bivariate  density.  Let 
pt(p)  denote  this  function  for  the  lognormal  density.  Then  pt  has  the  explicit  form 


Pt(p)  = 


-  1 
ex*  —  1 


and  we  note  that  derivative  of  pt  at  0  is  positive.  In  fact,  for  A2  =  0.25  the  derivative  at  0 
has  the  approximate  value  0.88.  Although  there  is  no  dosed  form  expression  for  pr,  one 
can  show  that  the  derivative  at  0  is  0.  The  implication  from  this  is  that  the  correlation 
of  the  Rayleigh  density  for  a  given  value  of  the  parameter  p  is  much  less  than  that  of 
the  lognormal  density.  This  makes  intuitive  sense  since  the  Rayleigh  process  involves  the 
sum  of  two  Gaussian  processes,  whereas  the  lognormal  process  involves  only  one.  Thus 
for  U3,  there  is  an  increase  in  the  dependency  under  Hq  compared  to  that  for  E\,  but 
this  increase  is  perhaps  not  so  significant  that  go  would  perform  better  than  j4,  as  we 
might  expect.  What  we  observe,  however,  is  a  degradation  of  the  results  for  E\  due  to 
the  increase  in  the  dependency  under  Hq. 

We  have  the  ROCs  corresponding  to  E4  in  Figure  10,  where  a  nearly  uniform 
ordering  is  gi,  gj,  go,  g*,  go.  This  ordering  is  predsely  that  of  Ej,  although  there  is  not 
as  large  a  difference  in  performance  here.  This  is  the  situation  in  which  the  dependency 
under  both  hypotheses  has  been  increased  from  E\ .  From  the  discussion  of  the  preceding 
paragraph,  we  know  that  the  dependency  under  Hi  is  stronger  than  that  under  Ho-  Thus 
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our  discussion  concerning  the  results  from  £3  apply  here,  and  we  may  interpret  this  as 
a  case  in  which  the  performance  from  £3  is  degraded  due  to  the  increased  dependency 
under  Hq. 

In  Figure  11  we  observe  the  results  for  £5,  where  we  hope  to  discern  the  change 
in  the  performance  from.  £3  by  taking  a  smaller  sample  sixe.  We  find  here  that  the 
performance  of  each  nonlinearity  has  declined,  as  is  to  be  expected.  In  the  minimax 
region  the  ordering  in  essentially  the  same  as  that  in  £3,  although  there  is  a  small  region 
where  £3  performs  best.  Clearly,  though,  the  performance  of  ft,  $3,  and  ft  are  practically 
the  same. 
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5.4  Conclusion 


In  conclusion,  we  may  wisli  to  consider  again  the  questions  which  were  posed  in  the 
beginning  of  this  chapter.  First,  regarding  the  validity  of  the  performance  measures,  the 
simulation  results  and  the  values  of  the  performance  measures  do  not  necessarily  correlate 
well.  In  other  words,  knowing  the  values  of  a  particular  performance  measure  for  two 
given  nonlinearities,  we  may  not  be  able  to  predict  which  nonlinearity  will  perform  better 
in  the  simulations.  Indeed,  the  nonlinearity  g 4  does  not  maximize  any  of  the  performance 
measures  Si  yet  it  has  shown  the  best  performance  in  the  example  E\.  This  does  not 
mean  that  the  performance  measures  are  not  useful.  On  the  contrary,  they  provide  us 
with  a  ~.°thod  for  calculating  other  nonlinearities  which,  as  evidenced  here,  might  prove 
to  perform  better  than  the  iid  nonlinearity.  Nor  does  this  mean  that  the  theory  is  flawed, 
since  the  tests  used  here  required  finite  sample  sizes,,  an-  aspect  which  was  neglected  in 
the  theory.  One  consequence  of  the  finite  sample  size  is  that  the  distributions  of  the  test 
statistics  are  not  truly  Gaussian.  In  fact,  the  distributions  of  the  test  statistics  To  and 
T\  are  strongly  skewed.  Although  this  might  be  undesirable  from  the  standpoint  of  the 
theory  alone,  this  phenomenon  is  not  necessarily  undesirable  in  practice.  Consider  the 
test  statistic  T\,  for  example.  The  variance  of  T\  under  the  hypothesis  Hq  is  extremely 
large  compared  to  the  variance  under  H\  and  the  compared  to  the  difference  between 
the  means  under  the  two  hypotheses.  However,  because  the  distribution  of  T\  is  skewed 
to  the  left,  most  of  the  outliers  under  Ho  fall  away  from  the  threshold,  and  this  results 
in  a  generally  good  performance. 

Second,  concerning  the  performance  of  memoryless  discriminators  for  dependent 
processes,  is  has  been  demonstrated  that  in  a  situation  of  weak  dependency,  the  ltd 
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discriminator  can  be  difficult  to  improve  upon,  while  for  a  situation  of  strong  dependency, 
the  methods  derived  here  show  definite  improvement  over  the  iid  discriminator.  We  might 
now  conjecture  as  to  how  such  an  improvement  might  come  about.  If  the  dependency  is 
strong  under  one  of  the  hypotheses,  say  H\ ,  then  the  result  of  maximizing  Si  will  likely 
lead  to  an  improvement.  This  is  because  maximizing  Si  will  result  in  a  small  value  of 
<Ti,  the  effect  of  which  is  to  minimize  the  much  stronger  dependency  condition  under  H\. 
This  conjecture  seems  to  be  corroborated  by  the  simulation  results  from  £2,  £4,  and 
£5  where  the  dependency  under  H\  is  stronger  than  that  under  Hq .  If,  however,  there 
is  strong  dependency  under  both  of  the  hypotheses,  then  perhaps  the  best  approach 
would  be  to  maximize  Si  or  S3,  since  in  this  way  both  of  the  dependency  conditions 
are  minimized.  The  relevant  example  here  is  £4,  where  g\  still  performs  best.  However, 
as  we  noted  above,  the  dependency  is  still  somewhat  stronger  under  H\  than  under 
Hq.  The  basic  premise  of  the  method  described  here  is  that  when  using  memoryless 
discriminators  for  dependent  processes,  the  best  one  can  do  to  deal  with  the  dependency 
conditions  is  to  minimize  their  effects.  The  performance  measures  S{  provide  framework 
for  doing  so. 

It  is  satisfying  to  observe  in  the  simulation  results  that  gi  and  53  perform  com¬ 
parably,  though  gi  generally  performs  slightly  L  stter.  This  is  important  because  the 
calculation  of  <73  is  much  easier  than  the  calculation  of  gi.  Considering  the  good  overall 
performance  of  the  nonlinearity  <73,  especially  when  there  is  strong  dependency  under 
both  hypotheses,  and  the  relatively  simple  calculation  needed  to  determine  it,  this  non¬ 
linearity  might  be  preferred  In  most  situations. 

The  theory  which  has  been  presented  here  is  entirely  based  on  central  limit  theory 
and  the  assumption  oflarge  sample  sizes.  Indeed,  all  of  the  performance  measures  derived 
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here  have  been  asymptotic  performance  measures.  Admittedly,  this  approach  might 
seem  to  be  too  narrow  in  its  application  to  be  useful  in  situations  where  a  relatively 
moderate  or  small  sarnie  size  is  required.  However,  when  faced  with  the  problem  of 
discriminating  between  two  possible  sources  for  an  observed  random  process  in  which 
there  is  correlation  in  the  observations,  there  is  little  else  that  one  can  do  other  than  the 
LET.  Therefore,  the  work  here  is  significant  in  that  it  does  present  an  alternative  to  the 
LBT:  the  memoryless  decision  rule.  In  situations  where  the  LRT  cannot  be  implemented 
and  there  is  decorrelation  of  the  observations  with  time,  one  might  be  tempted  to  assume 
that  the  observations  are  iid,  thereby  leading  to  a  memoryless  discriminator.  We  have 
seen  here  several  alternative  memoryless  discriminators,  some  of  which  might  improve 
upon  the  iid  discriminator.  Furthermore,  the  simulation  results  have  demonstrated  that 
good  results  can  be  obtained  for  even  moderate  sample  sizes.  One  possible  area  for 
future  research  might  be  to  examine  more  thoroughly  the  performance  for  various  sample 
sizes,  and  we  might  also  note  that  these  memoryless  discriminators  are  ideally  suited  for 
sequential  discrimination.  We  have  also  seen  how  some  of  the  results  can  be  made 
robust.  Robust  discrimination  is  important  in  many  applications  where  circumstances 
might  vary  from  test  to  test,  such  as  a  situation  of  radar  discrimination  of  targets.  Thus 
these  robustness  results  are  also  significant,  and  the  application  of  these  results  to  radar 
problems  might  also  be  an  area  for  future  research. 
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APPENDIX  A 


The  Rayleigh  Distribution 


Let  X  =  (Xi,...,Xn)  and  Y  =  (Yj, . . . , Yn)  be  independent  and  identically 
distributed  Gaussian  random  vectors  such  that  E  Xi  -  E  Yi  =  0  and  Var  X,  =  Var  F,  =  0, 
for  i  =  Suppose  also  that  EXiXj/ y/OiOj  =  E YiYj/y/Wj  =  pij.  Then  the 

random  vector  Z  =  (Z\,. . . ,  Zn)  defined  by  Zi  =  ^/X2  +  Y2  has  a  Rayleigh  distribution. 
For  n  >  2,  the  n-dimensional  density  involves  n  -  1  iterated  integrations  and  does  not 
have  a  closed  form  expression.  The  bivariate  density  for  (Zi,Zj)  is 

fZiZi^ v)  =  (1  exp{  2(1  -  pfj)  [*~  +  ?j]  }/o  [(1-p^v^l  ’ 

(u  >  0,  u  >  0)  (41) 

where  7o  is  the  modified  Bessel  function  of  the  first  kind  of  order  0.  If  the  vectors  X  and 
Y  are  stationary,  then  Z  is  stationary  and  the  density  for  (Zi,Zj+i)  can  be  written 

fza,  («,  v)  =  (1  _  p2)&2  exp  2(1  _  ^  Jo  [(1  _  p2)$\ , 

(u  >  0,  w  >  0)  (42) 


where  0  =  9\  —  0J+j  and  Pj  =  pi,j+i-  The  marginal  density  takes  the  form 


/(«)  =  ^exp(-|^),  (u  >  0). 


The  moments  of  the  Rayleigh  random  variable  Z  are  given  by 


EZn  =  (20)n'2r(|  +  l)  (n  =  1,2,...) 


where  T  denotes  the  gamma  function. 


81 


APPENDIX  B 


The  Lognormal  Distribution 


Let  X  =  be  a  Gaussian  random  vector  with  EX<  =  A^  and 

VarXj  =  Aj*.  Suppose  also  that  EXiXjj  =  EYiYj/ ifoUFj  =  pi,.  Then  the 
random  vector  Z  =  (Z\,. . . ,  Zn)  defined  by  Zi  =  exp(X,)  has  a  lognormal  distribution. 
If  Fx  and  Fz  denote  the  n-dimensional  distribution  functions  for  the  Gaussian  random 
variables  and  lognormal  random  variables,  respectively,  then  we  have  the  relation 

Fz(zu...,zn)  =  ^jc(log  *i  >  •  •  • ,  log  rn  )•  (B 1) 

We  may  therefore  obtain  a  relation  for  the  densities  by  differentiating,  and  this  yields 
/z(*i, •••,*»)  =  “ — ~T'/x(iogrx,...,logzn)  (B2) 

•  •  ■  Zn 


The  explicit  expression  for  the  bivariate  density  for  ( Zi,Zj )  is 


(B  3) 


If  the  random  vector  X  is  stationary,  then  so  is  the  vector  Z,  and  the  density  for 
(2\  ,  Zj+ 1 )  can  be  written 


fi(u,v)=  2tuvA2^/(1  -p))  x 


f  (log  u  —  At)2  -  2,0 j (log  u  -  A|)(logt>  -  AQ  +  (logp  -  At)2  ) 
*1  2A2(1  —  Pj)  1 
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where  Ai 


:  =  A*7*1*  and  pj  =  Pij+i-  The  marginal  density  has  the  form 

,,  s  1  {  (logtt-  Ai)3l 

/w  =  wl^“p{ — £7“ }' 


The  moments  of  the  lognormal  random  variable  Z  are  given  by 
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